We appreciate the target article's criticism of existing social science experimental methods: Results are often presented without clear boundary conditions and without making it easy to compare results across labs. We also appreciate the target article's main prescription for empirically addressing these issues: Systematically exploring the parameters that vary across existing studies and theories. However, in line with recent criticisms of the social sciences (e.g., Muthukrishna & Henrich, Reference Muthukrishna and Henrich2019), we believe this prescription only takes us halfway; good theorizing is still essential.
We wish to highlight two features of theories that seem especially imperative: (1) Well-specified: Theories should specify a causal process; otherwise it is difficult to generalize out of sample. (2) Grounded: The specified causal process should not “beg for explanation,” but instead be explicable in terms of well-understood processes; otherwise it is harder to build up a coherent scientific enterprise and theories might only superficially be adding explanatory power. These two features are intuitively appealing (e.g., Ahn, Kalish, Medin, & Gelman, Reference Ahn, Kalish, Medin and Gelman1995; Pacer & Lombrozo, Reference Pacer and Lombrozo2017), prescribed by philosophers of science (Lakatos, Reference Lakatos1978; Pearl, Reference Pearl2000; Woodward, Reference Woodward2003), and can be justified using Bayesian models (Goodman, Ullman, & Tenenbaum, Reference Goodman, Ullman and Tenenbaum2011; Griffiths & Tenenbaum, Reference Griffiths and Tenenbaum2009).
Social science theories can satisfy these two properties. Evolutionary game-theoretic approaches to moral psychology provide one exemplar (e.g., Hoffman & Yoeli, Reference Hoffman and Yoeli2022; Quillien, Reference Quillien2020). For instance, one account for why we donate to ineffective charities posits that we are partially motivated to give by the reputational benefits, and these reputational benefits can only depend on information that is easy for others to ascertain and agree upon – like whether you gave but not the impact of your gift (Burum, Nowak, & Hoffman, Reference Burum, Nowak and Hoffman2020). This account is “grounded” in the sense that it rests on premises that are consistent with known causal processes that do not themselves “beg for explanation” – our morals are subject to evolutionary forces, reputations are a key evolutionary force, and reputations can only depend on information others have and are likely to agree upon (e.g., Boyd, Reference Boyd2018; Cosmides, Guzmán, & Tooby, Reference Cosmides, Guzmán and Tooby2018; DeScioli & Kurzban, Reference DeScioli and Kurzban2013; Nowak & Sigmund, Reference Nowak and Sigmund2005). Moreover, this theory is “well-specified” in the sense that it specifies a causal process – reputational benefits shape our moral intuitions via biological or cultural evolution. Finally, this causal process makes clear predictions about generality – for example, we should be more sensitive to impact when it comes to our kin or savings decisions (Burum et al., Reference Burum, Nowak and Hoffman2020).
Computational models of cognition offer a second exemplar (e.g., Oaksford & Chater, Reference Oaksford and Chater1994; Quillien & Lucas, Reference Quillien and Lucas2023; Xu & Tenenbaum, Reference Xu and Tenenbaum2007). For instance, in one approach to explaining “anchoring and adjustment’ – the fact that numerical estimates can be biased in the direction of an arbitrarily selected value provided one is first asked if the true value is above or below that arbitrarily selected value – anchors are thought to provide a “seed” for a cognitive process that only slowly and effortfully adjusts (Lieder, Griffiths, Huys, & Goodman, Reference Lieder, Griffiths, Huys and Goodman2018a). In this model, people start at the seed, then sample a nearby numerical estimate, check the relative plausibility of this estimate, move toward the new estimate if it is judged to be more plausible, then repeat this process as long as it seems worth the cognitive costs. This explanation is “grounded” in the sense that it rests on plausible assumptions about the scarcity of computational resources and the need to rely on sampling algorithms instead of explicit representations of probability distributions (e.g., MacKay, Reference MacKay2003; Vul, Goodman, Griffiths, & Tenenbaum, Reference Vul, Goodman, Griffiths and Tenenbaum2014). This explanation is “well-specified” in the sense that it specifies a causal process, which suggests boundary conditions – people are expected to show more of an anchoring bias the fewer computational resources they allocate to the task, say, due to time constraints, cognitive load, or lack of motivation (Lieder, Griffiths, Huys, & Goodman, Reference Lieder, Griffiths, Huys and Goodman2018b).
We note that well-specified theories might already ameliorate the issues motivating the target article. The target article is partially motivated by “one-off studies” that seem to contradict each other because they are each run with different parameter settings and conclusions are over-generalized. However, we believe such over-generalizations would be less likely if researchers were forced to limit their conclusions to those warranted by their specified causal process. Consider research on group-synergies: Some studies find individuals work better in isolation, while others find they work better in groups. Such findings may only appear contradictory if we rely on overly broad conclusions – for example, “groups are synergistic.” If instead, we focused on causal processes – for example, “groups are useful for division of labor” – and restrict our conclusions to those warranted by the specified causal process – for example, “groups will perform relatively better when the task demands more division of labor” – we would have an easier time reconciling results across labs – for example, because one lab used a task that lent itself more to division of labor.
We also note that the target article's main prescription does not obviate the need for better theorizing. The authors suggest a systematic method of sampling from the parameter values that existing theories predict might matter (perhaps supplemented by “surrogate models” – constructed by training a deep neural network on large amounts of data). However, if existing theories (and surrogate models) are themselves not well-specified or grounded, it is not obvious how the prescribed approach will help us get any closer to theories that are, and without that, it is not obvious that we will not still be missing key latent variables not yet considered. For instance, the target article describes one instance (Agrawal, Peterson, & Griffiths, Reference Agrawal, Peterson and Griffiths2020) where the prescribed approach led to new discoveries in the “Moral Machine” paradigm, such as that people are less likely to save criminals than law-abiding citizens. However, without good theorizing, we are left not knowing what is causing these discoveries, and hence not being able to know their boundary conditions (beyond the dimensions investigated). Nor is it obvious what these discoveries teach us about moral psychology, or the social, cognitive, or biological forces that shape our morals, writ large.
One final note: Without winnowing down the set of theories under consideration, the target article's prescribed method may be unwieldy, since each theory suggests additional variables to systematically investigate. Restricting theories to those that are well-specified and grounded may help reduce the set of theories under consideration, thereby making the prescribed approach more viable.
We appreciate the target article's criticism of existing social science experimental methods: Results are often presented without clear boundary conditions and without making it easy to compare results across labs. We also appreciate the target article's main prescription for empirically addressing these issues: Systematically exploring the parameters that vary across existing studies and theories. However, in line with recent criticisms of the social sciences (e.g., Muthukrishna & Henrich, Reference Muthukrishna and Henrich2019), we believe this prescription only takes us halfway; good theorizing is still essential.
We wish to highlight two features of theories that seem especially imperative: (1) Well-specified: Theories should specify a causal process; otherwise it is difficult to generalize out of sample. (2) Grounded: The specified causal process should not “beg for explanation,” but instead be explicable in terms of well-understood processes; otherwise it is harder to build up a coherent scientific enterprise and theories might only superficially be adding explanatory power. These two features are intuitively appealing (e.g., Ahn, Kalish, Medin, & Gelman, Reference Ahn, Kalish, Medin and Gelman1995; Pacer & Lombrozo, Reference Pacer and Lombrozo2017), prescribed by philosophers of science (Lakatos, Reference Lakatos1978; Pearl, Reference Pearl2000; Woodward, Reference Woodward2003), and can be justified using Bayesian models (Goodman, Ullman, & Tenenbaum, Reference Goodman, Ullman and Tenenbaum2011; Griffiths & Tenenbaum, Reference Griffiths and Tenenbaum2009).
Social science theories can satisfy these two properties. Evolutionary game-theoretic approaches to moral psychology provide one exemplar (e.g., Hoffman & Yoeli, Reference Hoffman and Yoeli2022; Quillien, Reference Quillien2020). For instance, one account for why we donate to ineffective charities posits that we are partially motivated to give by the reputational benefits, and these reputational benefits can only depend on information that is easy for others to ascertain and agree upon – like whether you gave but not the impact of your gift (Burum, Nowak, & Hoffman, Reference Burum, Nowak and Hoffman2020). This account is “grounded” in the sense that it rests on premises that are consistent with known causal processes that do not themselves “beg for explanation” – our morals are subject to evolutionary forces, reputations are a key evolutionary force, and reputations can only depend on information others have and are likely to agree upon (e.g., Boyd, Reference Boyd2018; Cosmides, Guzmán, & Tooby, Reference Cosmides, Guzmán and Tooby2018; DeScioli & Kurzban, Reference DeScioli and Kurzban2013; Nowak & Sigmund, Reference Nowak and Sigmund2005). Moreover, this theory is “well-specified” in the sense that it specifies a causal process – reputational benefits shape our moral intuitions via biological or cultural evolution. Finally, this causal process makes clear predictions about generality – for example, we should be more sensitive to impact when it comes to our kin or savings decisions (Burum et al., Reference Burum, Nowak and Hoffman2020).
Computational models of cognition offer a second exemplar (e.g., Oaksford & Chater, Reference Oaksford and Chater1994; Quillien & Lucas, Reference Quillien and Lucas2023; Xu & Tenenbaum, Reference Xu and Tenenbaum2007). For instance, in one approach to explaining “anchoring and adjustment’ – the fact that numerical estimates can be biased in the direction of an arbitrarily selected value provided one is first asked if the true value is above or below that arbitrarily selected value – anchors are thought to provide a “seed” for a cognitive process that only slowly and effortfully adjusts (Lieder, Griffiths, Huys, & Goodman, Reference Lieder, Griffiths, Huys and Goodman2018a). In this model, people start at the seed, then sample a nearby numerical estimate, check the relative plausibility of this estimate, move toward the new estimate if it is judged to be more plausible, then repeat this process as long as it seems worth the cognitive costs. This explanation is “grounded” in the sense that it rests on plausible assumptions about the scarcity of computational resources and the need to rely on sampling algorithms instead of explicit representations of probability distributions (e.g., MacKay, Reference MacKay2003; Vul, Goodman, Griffiths, & Tenenbaum, Reference Vul, Goodman, Griffiths and Tenenbaum2014). This explanation is “well-specified” in the sense that it specifies a causal process, which suggests boundary conditions – people are expected to show more of an anchoring bias the fewer computational resources they allocate to the task, say, due to time constraints, cognitive load, or lack of motivation (Lieder, Griffiths, Huys, & Goodman, Reference Lieder, Griffiths, Huys and Goodman2018b).
We note that well-specified theories might already ameliorate the issues motivating the target article. The target article is partially motivated by “one-off studies” that seem to contradict each other because they are each run with different parameter settings and conclusions are over-generalized. However, we believe such over-generalizations would be less likely if researchers were forced to limit their conclusions to those warranted by their specified causal process. Consider research on group-synergies: Some studies find individuals work better in isolation, while others find they work better in groups. Such findings may only appear contradictory if we rely on overly broad conclusions – for example, “groups are synergistic.” If instead, we focused on causal processes – for example, “groups are useful for division of labor” – and restrict our conclusions to those warranted by the specified causal process – for example, “groups will perform relatively better when the task demands more division of labor” – we would have an easier time reconciling results across labs – for example, because one lab used a task that lent itself more to division of labor.
We also note that the target article's main prescription does not obviate the need for better theorizing. The authors suggest a systematic method of sampling from the parameter values that existing theories predict might matter (perhaps supplemented by “surrogate models” – constructed by training a deep neural network on large amounts of data). However, if existing theories (and surrogate models) are themselves not well-specified or grounded, it is not obvious how the prescribed approach will help us get any closer to theories that are, and without that, it is not obvious that we will not still be missing key latent variables not yet considered. For instance, the target article describes one instance (Agrawal, Peterson, & Griffiths, Reference Agrawal, Peterson and Griffiths2020) where the prescribed approach led to new discoveries in the “Moral Machine” paradigm, such as that people are less likely to save criminals than law-abiding citizens. However, without good theorizing, we are left not knowing what is causing these discoveries, and hence not being able to know their boundary conditions (beyond the dimensions investigated). Nor is it obvious what these discoveries teach us about moral psychology, or the social, cognitive, or biological forces that shape our morals, writ large.
One final note: Without winnowing down the set of theories under consideration, the target article's prescribed method may be unwieldy, since each theory suggests additional variables to systematically investigate. Restricting theories to those that are well-specified and grounded may help reduce the set of theories under consideration, thereby making the prescribed approach more viable.
Competing interest
None.