Behavioral proxies compete by the time courses of their rewards, including endogenous rewards

George Ainslie

doi:10.1017/S0140525X23002960

Behavioral proxies compete by the time courses of their rewards, including endogenous rewards

Published online by Cambridge University Press: 13 May 2024

George Ainslie

Show author details

George Ainslie*: Affiliation:
Department of Veterans Affairs Medical Center, Coatesville, PA, USA www.picoeconomics.org
*: Corresponding author: George Ainslie; Email: info@picoeconomics.org

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Natural selection is slow, so behavioral goals must be based on patterns of reward. Addictions are rewarded in the same way as adaptive choice, so they can be distinguished only by their time course. In addition, the reward process is more plastic than is generally recognized, so abstract goals are shaped by the “legibility” of their proxies.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e68

DOI: https://doi.org/10.1017/S0140525X23002960 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

The authors’ distinction between goal and proxy is especially useful in “preference learning” (target article, Table 1, sect. 4.1), where our understanding of selection by reward is in flux: (1) Given the necessarily slow pace of natural selection, the effective basis of behavioral goals must be reward. (2) “Addiction and other maladaptive habits” (target article, sect. 4.1, para. 1) depend on the same neural “behavior prioritization” (target article, sect. 4.1, para. 4) as adaptive choice, and can be distinguished only by their time course. (3) The reward process is more plastic than is generally recognized, so abstract goals are shaped by the “legibility” of their proxies.

(1) The authors depict the brain's goal as “the organism's fitness, utility, or wellbeing” (target article, sect. 4.1, para. 2) – evolutionary adaptiveness. But adaptiveness inevitably diverges from rewardingness. The action-selecting reward mechanism has clearly been shaped by natural selection, but the rewards it specifies become outdated when environmental circumstances change, as described in Box 2 in the target article. In the case of drug addictions, evolution is probably “trying” to correct the readiness of the reward mechanism to be hijacked by dopaminergic proxies – for instance, when it selects a gene that reduces the attraction of alcohol in some populations (Agarwal & Goedde, Reference Agarwal and Goedde1989) – but any such correction will take generations to occur. In the meantime, humans perceive welfare as a pattern of reward, and its connection with adaptiveness is purely historic. The connection may even be adversarial, as when birth control increases welfare but reduces numbers of offspring. In the time spans of actual lives, stable goals must be shaped by patterns of reward. As the authors say, “human goals … can be any arbitrary notion of, e.g., utility or wellbeing…” (target article, Box 2).
(2) The authors cite recent brain imaging that has confirmed common-currency theories of how motives compete to determine choice (Coddington & Dudman, Reference Coddington and Dudman2019; Levy & Glimcher, Reference Levy and Glimcher2012), which clear the way for a reward-based economy of all mental life (Silver, Singh, Precup, & Sutton, Reference Silver, Singh, Precup and Sutton2021). However, the target article describes motives’ operation only for the case of addiction, where a misguided reward signal leads to proxy failure (target article, sect. 4.1): This jibes with a now-standard account of drug addiction (Volkow, Wise, & Baler, Reference Volkow, Wise and Baler2017), but leaves unclear how the addiction case should be distinguished from routine reward-based goal setting. I propose that addictive proxies should be called hijackers when they lead to preferences that are only temporary; if they were stable they could not be distinguished from normal goals except by an eventual negative effect on evolutionary fitness (a view attributed to Becker & Murphy, Reference Becker and Murphy1988, target article, Box 2). A goal in the target article's terms would be an objective that survives temporary preferences.

Two mechanisms for temporary preference have been widely proposed: (a) The discounting of future outcomes in hyperbolic or similar curve (Green & Myerson, Reference Green, Myerson and Bermúdez2018) such that nearby events are overvalued; and/or (b) a temporary burst of rewarding power (Loewenstein, O'Donoghue, & Bhatia, Reference Loewenstein, O'Donoghue and Bhatia2015), for instance by emotional arousal or a drug. (a) Hyperbolic discounting is a robust finding both in visceral realms (as with nonhumans and young children) and deliberative activities (as in planning for retirement or dealing with global warming). However, when reported by subjects for the valuation of distant events, discounting raises the unexplored question of how quantitative expectations of such events are formed (Ainslie, Reference Ainslie2023, pp. 19–22; Rick & Loewenstein, Reference Rick and Loewenstein2008). (b) Most rewards are enabled by appetite, the presence of which is assumed when people evaluate them at a distance. You may not be willing to pay more for a food if shopping while hungry. But some portion of future appetites seems not to be anticipated, especially when considering options that would be preferred only temporarily (Ariely & Loewenstein, Reference Ariely and Loewenstein2006; Badger et al., Reference Badger, Bickel, Giordano, Jacobs, Loewenstein and Marsch2007). Perhaps people avoid considering some appetites so as not to arouse them. The partial neglect of future-aroused appetite, which makes a reward curve spike upward in its presence, is also an effect that needs exploration. In any case, some mechanism of temporary preference is necessary to distinguish addictive proxies from stable goals.
(3) “Many abstract goals cannot be observed directly” (target article, sect. 3.2) – likewise distant future goals – so they invite the creation of proxies. Such creation accords with many authors’ suspicion that belief is a reward-seeking activity, and even with behaviorist Howard Rachlin's radical proposal that “mental states (including sensations, perceptions, beliefs, knowledge, even pain) are… patterns of overt behavior” (Reference Rachlin2012, pp. 3–4). That is, beliefs are incentivized by their capacity to produce reward, which may happen entirely within the agent's imagination. Freed from the need for predicting external rewards, proxies may compete like fiat currencies, needing only to be protected from “inflation” by their unique “legibility” (target article, sects. 3.2 and 3.3), which thus overshadows their “ability to predict the future” as their main claim to selection.

A proxy that is a good story may turn into a goal in its own right. It may take the form of a proposition: Eating this food – or avoiding it – is morally good; there is a conspiracy to corrupt our children; Matisse produced the painting I bought. But a proxy that does not predict facts must survive through avoiding inflation by not just legibility but also singularity – standing out from alternative legible proxies by such factors as a unique logical argument, a parsimonious explanation, a rare coincidence, or endorsement by an authority (discussed in Ainslie, Reference Ainslie2013, Reference Ainslie2017, pp. 178–184, Reference Ainslie2023, pp. 19–22). This property is important not just to psychology, but also to recent proposals that economics recognize the utility of abstract goals (Bénabou & Tirole, Reference Bénabou and Tirole2016; Loewenstein & Molnar, Reference Loewenstein and Molnar2018). The self-selecting potential of proxies in neuroscience sets them apart from the other kinds of proxy the authors describe.

Financial support

This work was supported by the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs or of the US Government.

Competing interest

None.

References

Agarwal, D. P., & Goedde, H. W. (1989). Human aldehyde dehydrogenases: Their role in alcoholism. Alcohol, 6, 517–523.CrossRef Google Scholar PubMed

Ainslie, G. (2013). Grasping the impalpable: The role of endogenous reward in choices, including process addictions. Inquiry: A Journal of Medical Care Organization, Provision and Financing, 56, 446–469. doi:10.1080/0020174X.2013.806129, http://www.tandfonline.com/eprint/8fGTuFsnfFunYJKJ7aA7/full CrossRef Google Scholar

Ainslie, G. (2017). De gustibus disputare: Hyperbolic delay discounting integrates five approaches to choice. Journal of Economic Methodology, 24(2), 166–189. http://dx.doi.org/10.1080/1350178X.2017.1309748CrossRef Google Scholar

Ainslie, G. (2023). Behavioral construction of the future. Psychology of Addictive Behaviors, 37(1), 13–24. doi.org/10.1037/adb0000853 CrossRef Google Scholar PubMed

Ariely, D., & Loewenstein, G. (2006). The heat of the moment: The effect of sexual arousal on sexual decision making. Journal of Behavioral Decision Making, 19(2), 87–98.CrossRef Google Scholar

Badger, G. J., Bickel, W. K., Giordano, L. A., Jacobs, E. A., Loewenstein, G., & Marsch, L. (2007). Altered states: The impact of immediate craving on the valuation of current and future opioids. Journal of Health Economics, 26(5), 865–876.CrossRef Google Scholar PubMed

Becker, G., & Murphy, K. (1988). A theory of rational addiction. Journal of Political Economy, 96, 675–700.CrossRef Google Scholar

Bénabou, R., & Tirole, J. (2016). Mindful economics: The production, consumption, and value of beliefs. The Journal of Economic Perspectives, 30(3), 141–164.CrossRef Google Scholar

Coddington, L. T., & Dudman, J. T. (2019). Learning from action: Reconsidering movement signaling in midbrain dopamine neuron activity. Neuron, 104(1), 63–77. https://doi.org/10.1016/J.NEURON.2019.08.036CrossRef Google Scholar PubMed

Green, L., & Myerson, J. (2018). Preference reversals, delay discounting, rational choice, and the brain. In Bermúdez, J. L. (Ed.), Self-control, decision theory, and rationality: New essays (pp. 121–146). Cambridge University Press.CrossRef Google Scholar

Levy, D. J., & Glimcher, P. W. (2012). The root of all value: A neural common currency for choice. Current Opinion in Neurobiology, 22(6), 1027–1038. https://doi.org/10.1016/J.CONB.2012.06.001CrossRef Google Scholar PubMed

Loewenstein, G., & Molnar, A. (2018). The renaissance of belief-based utility in economics. Nature Human Behaviour, 2(3), 166–167.CrossRef Google Scholar

Loewenstein, G., O'Donoghue, T., & Bhatia, S. (2015). Modeling the interplay between affect and deliberation. Decision, 2(2), 55–62.CrossRef Google Scholar

Rachlin, H. (2012). Making IBM's computer, Watson, human. The Behavior Analyst, 35(1), 1–16.CrossRef Google Scholar PubMed

Rick, S., & Loewenstein, G. (2008). Intangibility in intertemporal choice. Philosophical Transactions of the Royal Society B, 363, 3813–3824.CrossRef Google Scholar PubMed

Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, 299, 103535.CrossRef Google Scholar

Volkow, N. D., Wise, R. A., & Baler, R. (2017). The dopamine motive system: Implications for drug and food addiction. Nature Reviews Neuroscience, 18(12), Article 12. https://doi.org/10.1038/nrn.2017.130CrossRef Google Scholar PubMed