Hostname: page-component-cd9895bd7-mkpzs Total loading time: 0 Render date: 2024-12-25T16:45:13.413Z Has data issue: false hasContentIssue false

Combining meta-learned models with process models of cognition

Published online by Cambridge University Press:  23 September 2024

Adam N. Sanborn*
Affiliation:
Department of Psychology, University of Warwick, Coventry, UK a.n.sanborn@warwick.ac.uk haijiang.yan@warwick.ac.uk chris.tsvetkov@warwick.ac.uk https://go.warwick.ac.uk/adamsanborn
Haijiang Yan
Affiliation:
Department of Psychology, University of Warwick, Coventry, UK a.n.sanborn@warwick.ac.uk haijiang.yan@warwick.ac.uk chris.tsvetkov@warwick.ac.uk https://go.warwick.ac.uk/adamsanborn
Christian Tsvetkov
Affiliation:
Department of Psychology, University of Warwick, Coventry, UK a.n.sanborn@warwick.ac.uk haijiang.yan@warwick.ac.uk chris.tsvetkov@warwick.ac.uk https://go.warwick.ac.uk/adamsanborn
*
*Corresponding author.

Abstract

Meta-learned models of cognition make optimal predictions for the actual stimuli presented to participants, but investigating judgment biases by constraining neural networks will be unwieldy. We suggest combining them with cognitive process models, which are more intuitive and explain biases. Rational process models, those that can sequentially sample from the posterior distributions produced by meta-learned models, seem a natural fit.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

Meta-learned models of cognition offer an exciting opportunity to address a central weakness of current cognitive models, whether Bayesian or not: Cognitive models generally do not “see” the experimental stimuli shown to participants. Experimenters instead feed models low-dimensional descriptions of the stimuli, which are often in terms of the psychological features imagined by the experimenter, or sometimes are the psychological descriptions that best fit participants’ judgments (e.g., stimulus similarity judgments; Nosofsky, Sanders, Meagher, & Douglas, Reference Nosofsky, Sanders, Meagher and Douglas2018).

For example, in studies of probability judgment, participants have been asked to judge the probability that “Bill plays jazz for a hobby” after having been given the description, “Bill is 34 years old. He is intelligent, but unimaginative, compulsive, and generally lifeless. In school, he was strong in mathematics but weak in social studies and humanities” (Tversky & Kahneman, Reference Tversky and Kahneman1983). Current probability judgment models reduce these descriptions down to a single unknown number, and attempt to find the latent probability that best fits the data (e.g., Zhu, Sanborn, & Chater, Reference Zhu, Sanborn and Chater2020).

Models trained on the underlying statistics of the environment, as meta-learned models are, can bypass this need to infer a latent variable, instead making predictions from the actual descriptions used. Indeed, even relatively simple models of semantics that locate phrases in a vector space produce judgments that correlate with the probabilities experimental participants give (Bhatia, Reference Bhatia2017). Meta-learned models could thus explain a great deal of the variability in human behavior, and allow experimenters to generalize beyond the stimuli shown to participants.

However, used as descriptive models, normative meta-learned models of cognition inherit a fundamental problem from the Bayesian approach: People's reliable deviations from normative behavior. One compelling line of research shows that probability judgments are incoherent in a way that Bayesian models are not. Using the above example of Bill, Tversky and Kahneman (Reference Tversky and Kahneman1983) found participants ranked the probability of “Bill is an accountant who plays jazz for a hobby” as higher than that of “Bill plays jazz for a hobby.” This violates the extension rule of probability because the set of all accountants who play jazz for a hobby is a subset of all people who play jazz for a hobby, no matter how Bill is described.

The target article discusses constraining meta-learned models to better describe behavior, such as reducing the number of hidden units or restricting the representational fidelity of units. These manipulations have produced a surprising and interesting range of biases, including stochastic and incoherent probability judgments (Dasgupta, Schulz, Tenenbaum, & Gershman, Reference Dasgupta, Schulz, Tenenbaum and Gershman2020). However, this is just the start to explaining human biases. Even a single bias such as the conjunction fallacy has intricacies, such as the higher rate of conjunction fallacies when choosing versus estimating (Wedell & Moro, Reference Wedell and Moro2008), and greater variability in judgments of conjunctions than those of simple events (Costello & Watts, Reference Costello and Watts2017).

Cognitive process models aim to explain these biases in detail. For conjunction fallacies, a variety of well-supported models exist, based on ideas such as participants sampling events with noise in the retrieval process (Costello & Watts, Reference Costello and Watts2014), or by sacrificing probabilistic coherence to improve judgment accuracy based on samples (Zhu et al., Reference Zhu, Sanborn and Chater2020), or by representing conjunctions as a weighted average of simple events (Juslin, Nilsson, & Winman, Reference Juslin, Nilsson and Winman2009), or by using quantum probability (Busemeyer, Pothos, Franco, & Trueblood, Reference Busemeyer, Pothos, Franco and Trueblood2011). These kinds of models capture many details of the empirical effects, through simple and intuitive mechanisms like adjusting the amount of noise or number of samples, which helps identify experiments to distinguish between them.

Mechanistically modifying meta-learned models to explain cognitive biases to the level cognitive process models do appears difficult. While changes to network structure are powerful ways to induce different biases that could identify implementation-level constraints in the brain, the effects of these kinds of changes are generally hard to intuit, while training constrained meta-learning models to test different manipulations will be slow and computationally expensive. Thus, it will be challenging to reproduce existing biases in detail or to design effective experiments for testing these constraints.

Combining meta-learned models with cognitive process models is more promising. One possibility is to have meta-learned models act as a “front end” that takes stimuli and converts them to a feature-based representation, which is then operated on by a cognitive process model. The parameters of the cognitive process model could be fit to human data, or potentially the cognitive process model could be encoded into the network (e.g., Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021), and meta-learning could be done on the front end and the cognitive process parameters end-to-end.

However, as meta-learned models of cognition produce posterior predictive distributions, rational process models offer a straightforward connection that does not require retraining meta-learned models. Rational process models do not directly use a posterior predictive distribution, but instead assume that the posterior predictive distribution is approximated (i.e., using the posterior mean, posterior median, or other summary statistic depending on task), most often using a statistical sampling algorithm (Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012). Such a model can explain details of the conjunction fallacy, and also a wide range of other biases, such as stochastic choice, anchoring and repulsion effects in estimates, long-range autocorrelations in judgment, and the flaws in random sequence generation (Castillo, León-Villagrá, Chater, & Sanborn, Reference Castillo, León-Villagrá, Chater and Sanborn2024; Spicer, Zhu, Chater, & Sanborn, Reference Spicer, Zhu, Chater and Sanborn2022; Vul, Goodman, Griffiths, & Tenenbaum, Reference Vul, Goodman, Griffiths and Tenenbaum2014; Zhu, León-Villagrá, Chater, & Sanborn, Reference Zhu, León-Villagrá, Chater and Sanborn2022, Reference Zhu, Sundh, Spicer, Chater and Sanborn2023). What these models have lacked, however, is a principled way to construct the posterior predictive distribution from environmental statistics, and here meta-learned models offer that exciting possibility.

While rational process models offer what we think is a natural choice for integration, any sort of combination with existing cognitive models offers benefits. Being able to explain both the details of biases as cognitive process models do, as well as showing sensitivity to actual stimuli is a powerful combination that moves toward the long-standing goal of a general model of cognition. Overall we see meta-learned models of cognition as not supplanting existing cognitive models, but as a way to make them much more powerful and relevant to understanding and predicting behavior.

Acknowledgments

None.

Financial support

A. N. S. and C. T. were supported by a European Research Council consolidator grant (817492 – SAMPLING). H. Y. was supported by a Chancellor's International Scholarship from the University of Warwick.

Competing interest

None.

References

Bhatia, S. (2017). Associative judgment and vector space semantics. Psychological Review, 124(1), 120. http://dx.doi.org/10.1037/rev0000047CrossRefGoogle ScholarPubMed
Busemeyer, J. R., Pothos, E. M., Franco, R., & Trueblood, J. S. (2011). A quantum theoretical explanation for probability judgment errors. Psychological Review, 118(2), 193218. https://doi.org/10.1037/a0022542CrossRefGoogle ScholarPubMed
Castillo, L., León-Villagrá, P., Chater, N., & Sanborn, A. (2024). Explaining the flaws in human random generation as local sampling with momentum. PLoS Computational Biology, 20(1), e1011739. https://doi.org/10.1371/journal.pcbi.1011739CrossRefGoogle ScholarPubMed
Costello, F., & Watts, P. (2014). Surprisingly rational: Probability theory plus noise explains biases in judgment. Psychological Review, 121(3), 463480. https://doi.org/10.1037/a0037010CrossRefGoogle ScholarPubMed
Costello, F., & Watts, P. (2017). Explaining high conjunction fallacy rates: The probability theory plus noise account. Journal of Behavioral Decision Making, 30(2), 304321. https://dx.doi.org/10.1002/bdm.1936CrossRefGoogle Scholar
Dasgupta, I., Schulz, E., Tenenbaum, J. B., & Gershman, S. J. (2020). A theory of learning to infer. Psychological Review, 127(3), 412441. https://doi.org/10.1037/rev0000178CrossRefGoogle ScholarPubMed
Griffiths, T. L., Vul, E., & Sanborn, A. N. (2012). Bridging levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science, 21(4), 263268. https://doi.org/10.1177/0963721412447619CrossRefGoogle Scholar
Juslin, P., Nilsson, H., & Winman, A. (2009). Probability theory, not the very guide of life. Psychological Review, 116(4), 856874. https://doi.org/10.1037/a0016979CrossRefGoogle Scholar
Nosofsky, R. M., Sanders, C. A., Meagher, B. J., & Douglas, B. J. (2018). Toward the development of a feature-space representation for a complex natural category domain. Behavior Research Methods, 50, 530556. https://doi.org/10.3758/s13428-017-0884-8CrossRefGoogle ScholarPubMed
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale experiments and machine learning to discover theories of human decision-making. Science, 372(6547), 12091214. https://doi.org/10.1126/science.abe2629CrossRefGoogle ScholarPubMed
Spicer, J., Zhu, J. Q., Chater, N., & Sanborn, A. N. (2022). Perceptual and cognitive judgments show both anchoring and repulsion. Psychological Science, 33(9), 13951407. https://doi.org/10.1177/09567976221089599CrossRefGoogle ScholarPubMed
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293315. https://doi.org/10.1037/0033-295X.90.4.293CrossRefGoogle Scholar
Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (2014). One and done? Optimal decisions from very few samples. Cognitive Science, 38(4), 599637. https://doi.org/10.1111/cogs.12101CrossRefGoogle ScholarPubMed
Wedell, D. H., & Moro, R. (2008). Testing boundary conditions for the conjunction fallacy: Effects of response mode, conceptual focus, and problem type. Cognition, 107(1), 105136. https://doi.org/10.1016/j.cognition.2007.08.003CrossRefGoogle ScholarPubMed
Zhu, J. Q., León-Villagrá, P., Chater, N., & Sanborn, A. N. (2022). Understanding the structure of cognitive noise. PLoS Computational Biology, 18(8), e1010312. https://doi.org/10.1371/journal.pcbi.1010312CrossRefGoogle ScholarPubMed
Zhu, J.-Q., Sanborn, A. N., & Chater, N. (2020). The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, 127(5), 719748. https://doi.org/10.1037/rev0000190CrossRefGoogle ScholarPubMed
Zhu, J.-Q., Sundh, J., Spicer, J., Chater, N., & Sanborn, A. N. (2023). The autocorrelated Bayesian sampler: A rational process for probability judgments, estimates, confidence intervals, choices, confidence judgments, and response times. Psychological Review, 131(2), 456493. https://doi.org/10.1037/rev0000427CrossRefGoogle ScholarPubMed