No CrossRef data available.
Article contents
Against naïve induction from experimental data
Published online by Cambridge University Press: 05 February 2024
Abstract
This commentary argues against the indictment of current experimental practices such as piecemeal testing, and the proposed integrated experiment design (IED) approach, which we see as yet another attempt at automating scientific thinking. We identify a number of undesirable features of IED that lead us to believe that its broad application will hinder scientific progress.
- Type
- Open Peer Commentary
- Information
- Copyright
- Copyright © The Author(s), 2024. Published by Cambridge University Press
References
Birnbaum, M. H. (2008). New paradoxes of risky decision making. Psychological Review, 115, 463–501.CrossRefGoogle ScholarPubMed
Chang, H. (2004). Inventing temperature: Measurement and scientific progress. Oxford University Press.CrossRefGoogle Scholar
Cox, G. E., & Shiffrin, R. M. (2017). A dynamic approach to recognition memory. Psychological Review, 124, 795–860.CrossRefGoogle ScholarPubMed
Dunn, J. C., & Rao, L. L. (2019). Models of risky choice: A state-trace and signed difference analysis. Journal of Mathematical Psychology, 90, 61–75.CrossRefGoogle Scholar
Garcia-Marques, L., & Ferreira, M. B. (2011). Friends and foes of theory construction in psychological science: Vague dichotomies, unified theories of cognition, and the new experimentalism. Perspectives on Psychological Science, 6, 192–201.CrossRefGoogle ScholarPubMed
Hacking, I. (1983). Representing and intervening: Introductory topics in the philosophy of natural science. Cambridge University Press.CrossRefGoogle Scholar
Hotaling, J. M., Donkin, C., Jarvstad, A., & Newell, B. R. (2022). MEM-EX: An exemplar memory model of decisions from experience. Cognitive Psychology, 138, 101517.CrossRefGoogle ScholarPubMed
Humphreys, M. S., Bain, J. D., & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic, and procedural tasks. Psychological Review, 96, 208–233.CrossRefGoogle Scholar
Kellen, D. (2019). A model hierarchy for psychological science. Computational Brain & Behavior, 2, 160–165.CrossRefGoogle Scholar
Kellen, D., Steiner, M. D., Davis-Stober, C. P., & Pappas, N. R. (2020). Modeling choice paradoxes under risk: From prospect theories to sampling-based accounts. Cognitive Psychology, 118, 101258.CrossRefGoogle ScholarPubMed
Lewandowsky, S., Oberauer, K., & Brown, G. D. (2009). No temporal decay in verbal short-term memory. Trends in Cognitive Sciences, 13, 120–126.CrossRefGoogle ScholarPubMed
Maxwell, J. C. (1860/1965). General considerations concerning scientific apparatus. In Niven, W. D. (Ed.), The scientific papers of James Clerk Maxwell (Vol. 2, pp. 505–522). Dover.Google Scholar
Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1, 108–141.CrossRefGoogle Scholar
Oberauer, K., Lewandowsky, S., Awh, E., Brown, G. D., Conway, A., Cowan, N., … Ward, G. (2018). Benchmarks for models of short-term and working memory. Psychological Bulletin, 144, 885–958.CrossRefGoogle ScholarPubMed
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale experiments and machine learning to discover theories of human decision-making. Science (New York, N.Y.), 372, 1209–1214.CrossRefGoogle ScholarPubMed
Proulx, T., & Morey, R. D. (2021). Beyond statistical ritual: Theory in psychological science. Perspectives on Psychological Science, 16, 671–681.CrossRefGoogle ScholarPubMed
Roediger, H. L. (2008). Relativity of remembering: Why the laws of memory vanished. Annual Review of Psychology, 59, 225–254.CrossRefGoogle ScholarPubMed
Roediger, H. L. III, & Blaxton, T. A. (1987). Retrieval modes produce dissociations in memory for surface information. In Gorfein, D. S. & Hoffman, R. R. (Eds.), Memory and learning: The Ebbinghaus Centennial conference (pp. 349–379). Erlbaum.Google Scholar
Seamon, J. G., Williams, P. C., Crowley, M. J., Kim, I. J., Langer, S. A., Orne, P. J., & Wishengrad, D. L. (1995). The mere exposure effect is based on implicit memory: Effects of stimulus type, encoding conditions, and number of exposures on recognition and affect judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 711–721.Google Scholar
Shiffrin, R. M., Börner, K., & Stigler, S. M. (2018). Scientific progress despite irreproducibility: A seeming paradox. Proceedings of the National Academy of Sciences of the United States of America, 115, 2632–2639.CrossRefGoogle ScholarPubMed
Shiffrin, R. M., & Nobel, P. A. (1997). The art of model development and testing. Behavior Research Methods, Instruments, & Computers, 29, 6–14.CrossRefGoogle Scholar
Singmann, H., Kellen, D., Cox, G. E., Chandramouli, S. H., Davis-Stober, C. P., Dunn, J. C., … Shiffrin, R. M. (2023). Statistics in the service of science: Don't let the tail wag the dog. Computational Brain & Behavior, 6, 64–83.CrossRefGoogle Scholar
Trendler, G. (2009). Measurement theory, psychology and the revolution that cannot happen. Theory & Psychology, 19, 579–599.CrossRefGoogle Scholar
Turner, B. M. (2019). Toward a common representational framework for adaptation. Psychological Review, 126, 660–692.CrossRefGoogle Scholar
Vergauwe, E., & Cowan, N. (2015). Working memory units are all in your head: Factors that influence whether features or objects are the favored units. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 1404–1416.Google ScholarPubMed
You have
Access
After so many years observing the prosecution of p-values and everyday laboratory life, we are pleased to see a growing number of researchers turning their attention to critical matters such as theory development and experimentation (e.g., Proulx & Morey, Reference Proulx and Morey2021). But as we transition into these important new debates, it is crucial to avoid past intellectual excesses. In particular, we note a tendency to embrace passive technological solutions to problems of scientific inference and discovery that make little room for the kind of active theory building and critical thinking that in fact result in meaningful scientific advances (see Singmann et al., Reference Singmann, Kellen, Cox, Chandramouli, Davis-Stober, Dunn and Shiffrin2023). In this vein, we wish to express serious reservations regarding Almaatouq et al.'s critique.
The observation of puzzling, incongruent, and incommensurate results across studies is a common affair in the experimental sciences (see Chang, Reference Chang2004; Galison, Reference Galison1987; Hacking, Reference Hacking1983). Indeed, one of the central roles of experimentation is to “create, produce, refine and stabilize phenomena” (Hacking, Reference Hacking1983, p. 229), which is achieved through an iterative process that includes the ongoing improvement of experimental apparati (see Chang, Reference Chang2004; Trendler, Reference Trendler2009) and relevant variables (Jantzen, Reference Jantzen2021). This process was discussed long ago by Maxwell (Reference Maxwell and Niven1890/1965), who described it as removing the influence of “disturbing agents” from a “field of investigation.”
Looking back at the history of modern memory research, we can identify this process in the development of experimental tasks (e.g., recognition, cued recall) with clear procedures (study/test phases) and stimuli (e.g., high-frequency words). This process is also manifest in the resolution of empirical puzzles, such as the innumerous exceptions, incongruencies, and boundary conditions encountered by researchers in the search for the “laws of memory” (for a review, see Roediger, Reference Roediger2008). Far from insurmountable, these empirical puzzles have been continuously resolved through the interplay of tailored experiments and theories (e.g., Cox & Shiffrin, Reference Cox and Shiffrin2017; Hotaling, Donkin, Jarvstad & Newell, Reference Hotaling, Donkin, Jarvstad and Newell2022; Humphreys, Bain, & Pike, Reference Humphreys, Bain and Pike1989; Roediger & Blaxton, Reference Roediger, Blaxton, Gorfein and Hoffman1987; Seamon et al., Reference Seamon, Williams, Crowley, Kim, Langer, Orne and Wishengrad1995; Turner, Reference Turner2019; Vergauwe & Cowan, Reference Vergauwe and Cowan2015). More specifically, candidate theories are constructed to explain existing results by postulating constructs (e.g., “trace strength”) and specifying how those constructs are related to observables (e.g., “more study time leads to more trace strength which leads to faster response times”). These theories also specify what should not be relevant, thereby identifying potential confounding variables that future experiments should control. For an exemplary case, consider the domain of short-term memory, where we can find a large body of empirical phenomena (e.g., Oberauer et al., Reference Oberauer, Lewandowsky, Awh, Brown, Conway, Cowan and Ward2018) alongside explanatory accounts that can accommodate them (e.g., interference-based theories; see Lewandowsky, Oberauer, & Brown, Reference Lewandowsky, Oberauer and Brown2009).
Against this backdrop, it is difficult to find Almaatouq et al.'s critique convincing. On the one hand, they fail to explain the success of existing experimental practices (e.g., piecemeal testing) in domains such as human memory. On the other, their treatment case studies such as “group synergy,” which has amassed a wealth of conflicting findings, do not include any indication that the process described above has failed. This omission opens a number of possible explanations. For example, incongruent results may reflect experimental artifacts or hidden ceteris paribus clauses and other preconditions (Meehl, Reference Meehl1990, p. 109) – can we really say that these procedures have been thoroughly pursued? Alternatively, incongruent results could be a sign that those results should not be treated as part of the same “space” in the first place, that is, that they do not define a cohesive body of results that can be explained by a common theory.
Moving on to the actual proposal of integrated experiment design (IED), we find its potential contribution to be largely negative. Referring back to Maxwell's (Reference Maxwell and Niven1890/1964) description, what IED proposes is to allow “disturbing agents” back into the “field of investigation” as long as they are appropriately tagged and recorded. It is difficult to imagine how Newton's laws of motion could ever emerge from large-scale experiments evaluating different shapes of objects, velocities, viscosities, surface textures, and so on. Our main concerns with IED are summarized below:
(1) By placing a premium on commensurability, IED decreases the chances of new and unexpected findings (Shiffrin, Börner, & Stigler, Reference Shiffrin, Börner and Stigler2018).
(2) By shifting researchers’ resources toward the joint observation of a large number of factors, IED disrupts the piecemeal efforts in experimentation and theorization that illuminate the processes underlying human data generation. For instance, it makes it difficult to tell an important result from one caused by a confound (for discussions, see Garcia-Marques & Ferreira, Reference Garcia-Marques and Ferreira2011; Kellen, Reference Kellen2019; Shiffrin & Nobel, Reference Shiffrin and Nobel1997).
(3) IED turns existential-abductive reasoning on its head: Instead of developing explanatory constructs (e.g., model development) in response to existing covariational information, a construct would be assumed a priori in the form of an empty vessel, to be later infused by the results of an experiment manipulating factors presumably related to it. For instance, the construct “attention” would be identified with the experimental manipulations thought to be relevant to “attention.” This concern is materialized by the treatment of the so-called Moral Machine, a statistical model summarizing the observed relationships between moral judgments and a host of variables, as a bona fide theory of moral reasoning.
(4) By introducing a large number of factors, IED can easily degrade researchers’ ability to identify which theoretical components are doing the leg work and which ones are failing, especially when compared to piecemeal testing (e.g., Birnbaum, Reference Birnbaum2008; Dunn & Rao, Reference Dunn and Rao2019; Kellen, Steiner, Davis-Stober, & Pappas, Reference Kellen, Steiner, Davis-Stober and Pappas2020). The recent application of IED to risky-choice modeling (Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021) illustrates this concern, as it is unclear which specific circumstances are leading one choice model to outperform another (e.g., is context dependency driven by feedback?).
It is our judgment that there is no one best way to do science, and that attempts to tell scientists how to do their job, including IED, will slow and hinder progress. IED is solving a problem that does not exist and introduces a problem that science should do without.
Financial support
David Kellen was supported by NSF CAREER Award ID 2145308.
Competing interest
None.