Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-25T16:23:51.419Z Has data issue: false hasContentIssue false

22 - Computational Cognitive Models of Reinforcement Learning

from Part III - Computational Modeling of Basic Cognitive Functionalities

Published online by Cambridge University Press:  21 April 2023

Ron Sun
Affiliation:
Rensselaer Polytechnic Institute, New York
Get access

Summary

This chapter first reviews advanced methods in reinforcement learning (RL), namely, hierarchical RL, distributional RL, meta-RL, RL as inference, inverse RL, and multi-agent RL. Computational and cognitive models based on reinforcement learning are then presented, including detailed models of the basal ganglia, variety of dopamine neuron responses, roles of serotonin and other neuromodulators, intrinsic reward and motivation, neuroeconomics, and computational psychiatry.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In 21st International Conference on Machine Learning, Banff, Canada.Google Scholar
Alexander, G. E., & Crutcher, M. D. (1990). Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends in Neuroscience, 13, 266271. https://doi.org/10.1016/0166-2236(90)90107-LCrossRefGoogle ScholarPubMed
Ardiel, E. L., & Rankin, C. H. (2010). An elegant mind: learning and memory in Caenorhabditis elegans. Learning and Memory, 17(4), 191201. https://doi.org/10.1101/lm.960510Google Scholar
Aston-Jones, G., & Cohen, J. D. (2005). An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annual Reviews in Neuroscience, 28, 403450. https://doi.org/10.1146/annurev.neuro.28.061604.135709Google Scholar
Bacon, P.-L., Harb, J., & Precup, D. (2017). The option-critic architecture. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17).CrossRefGoogle Scholar
Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113(3), 329349. https://doi.org/10.1016/j.cognition.2009.07.005Google Scholar
Balleine, B. W., Dezfouli, A., Ito, M., & Doya, K. (2015). Hierarchical control of goal-directed action in the cortical–basal ganglia network. Current Opinion in Behavioral Sciences, 5, 17. https://doi.org/10.1016/j.cobeha.2015.06.001CrossRefGoogle Scholar
Barreto, A., Hou, S., Borsa, D., Silver, D., & Precup, D. (2020). Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences (online). https://doi.org/10.1073/pnas.1907370117CrossRefGoogle Scholar
Bavard, S., Lebreton, M., Khamassi, M., Coricelli, G., & Palminteri, S. (2018). Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences. Nature Communications, 9(1), 4503. https://doi.org/10.1038/s41467-018-06781-2CrossRefGoogle ScholarPubMed
Behrens, T. E., Woolrich, M. W., Walton, M. E., & Rushworth, M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10(9), 12141221. https://doi.org/10.1038/nn1954Google Scholar
Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In Proceedings of Machine Learning Research. http://proceedings.mlr.press/v70/bellemare17a.htmlGoogle Scholar
Bellman, R. (1952). On the theory of dynamic programming. Proceedings of the National Academy of Sciences, 38, 716719.Google Scholar
Belova, M. A., Paton, J. J., Morrison, S. E., & Salzman, C. D. (2007). Expectation modulates neural responses to pleasant and aversive stimuli in primate amygdala. Neuron, 55(6), 970984. https://doi.org/10.1016/j.neuron.2007.08.004Google Scholar
Bloem, B., Huda, R., Sur, M., & Graybiel, A. M. (2017). Two-photon imaging in mice shows striosomes and matrix have overlapping but differential reinforcement-related responses. eLife, 6. https://doi.org/10.7554/eLife.32353CrossRefGoogle ScholarPubMed
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Sciences, 16(10), 485488. https://doi.org/10.1016/j.tics.2012.08.006Google Scholar
Boureau, Y. L., & Dayan, P. (2011). Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology, 36(1), 7497. https://doi.org/10.1038/npp.2010.151Google Scholar
Bromberg-Martin, E. S., Matsumoto, M., Hong, S., & Hikosaka, O. (2010). A pallidus-habenula-dopamine pathway signals inferred stimulus values. Journal of Neurophysiology, 104(2), 10681076. https://doi.org/10.1152/jn.00158.2010Google Scholar
Cassell, M. D., Freedman, L. J., & Shi, C. (1999). The intrinsic organization of the central extended amygdala. Annals of New York Academy of Sciences, 877, 217240.CrossRefGoogle Scholar
Chen, C., Takahashi, T., Nakagawa, S., Inoue, T., & Kusumi, I. (2015). Reinforcement learning in depression: a review of computational research. Neuroscience and Biobehavioral Reviews, 55, 247267. https://doi.org/10.1016/j.neubiorev.2015.05.005Google Scholar
Cilden, E., & Polat, F. (2015). Toward generalization of automated temporal abstraction to partially observable reinforcement learning. IEEE Transactions on Cybernetics, 45(8), 14141425. https://doi.org/10.1109/TCYB.2014.2352038CrossRefGoogle ScholarPubMed
Collins, A. G., & Frank, M. J. (2014). Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337366. https://doi.org/10.1037/a0037015Google Scholar
Courville, A. C., Daw, N. D., & Touretzky, D. S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10(7), 294300. https://doi.org/10.1016/j.tics.2006.05.004Google Scholar
Cui, G., Jun, S. B., Jin, X., et al. (2013). Concurrent activation of striatal direct and indirect pathways during action initiation. Nature, 494(7436), 238242. https://doi.org/10.1038/nature11846Google Scholar
Dabney, W., Kurth-Nelson, Z., Uchida, N., et al. (2020). A distributional code for value in dopamine-based reinforcement learning. Nature, 577(7792), 671675. https://doi.org/10.1038/s41586-019-1924-6Google Scholar
Dabney, W., Ostrovski, G., Silver, D., & Munos, R. M. (2018). Implicit quantile networks for distributional reinforcement learning. In 35th International Conference on Machine Learning (ICML 2018).Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 12041215. https://doi.org/10.1016/j.neuron.2011.02.027Google Scholar
Daw, N. D., Kakade, S., & Dayan, P. (2002). Opponent interactions between serotonin and dopamine. Neural Networks, 15(4–6), 603616. www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12371515Google Scholar
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 17041711. https://doi.org/10.1038/nn1560Google Scholar
Dayan, P. (1993). Improving generalization for temporal difference learning: the successor representation. Neural Computation, 5(4), 613624. https://doi.org/10.1162/neco.1993.5.4.613Google Scholar
Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in Neural Information Processing Systems 5 (pp. 271278). San Francisco, CA: Morgan Kaufmann Publishers Inc.Google Scholar
Dayan, P., & Sejnowski, T. J. (1996). Exploration bonuses and dual control. Machine Learning, 25, 522.CrossRefGoogle Scholar
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI).Google Scholar
Delong, M. R. (1990). Primate models of movement disorders of basal ganglia origin. Trends in Neurosciences, 13, 281285.Google Scholar
Devin, C., Gupta, A., Darrell, T., Abbeel, P., & Levine, S. (2017). Learning modular neural network policies for multi-task and multi-robot transfer. ICRA 2017 (online). https://doi.org/10.1109/ICRA.2017.7989250Google Scholar
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227303.Google Scholar
Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12, 961974. https://doi.org/10.1016/S0893-6080(99)00046-5Google Scholar
Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10(6), 732739.CrossRefGoogle ScholarPubMed
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15, 495506. https://doi.org/10.1016/S0893-6080(02)00044-8Google Scholar
Doya, K. (2008). Modulators of decision making. Nature Neuroscience, 11(4), 410416. https://doi.org/10.1038/nn2077Google Scholar
Doya, K. (2021). Canonical cortical circuits and the duality of Bayesian inference and optimal control. Current Opinion in Behavioral Sciences, 41, 160166. https://doi.org/10.1016/j.cobeha.2021.07.003Google Scholar
Doya, K., Miyazaki, K. W., & Miyazaki, K. (2021). Serotonergic modulation of cognitive computations. Current Opinion in Behavioral Sciences, 38, 116123. https://doi.org/10.1016/j.cobeha.2021.02.003Google Scholar
Doya, K., Samejima, K., Katagiri, K., & Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6), 13471369. https://doi.org/10.1162/089976602753712972CrossRefGoogle ScholarPubMed
Doya, K., & Uchibe, E. (2005). The Cyber Rodent Project: exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior, 13(2), 149160. https://doi.org/10.1177/105971230501300206Google Scholar
Elfwing, S., & Doya, K. (2014). Emergence of polymorphic mating strategies in robot colonies. PLoS One, 9(4), e93622. https://doi.org/10.1371/journal.pone.0093622Google Scholar
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2011). Darwinian embodied evolution of the learning ability for survival. Adaptive Behavior, 19(2), 101120. https://doi.org/10.1177/1059712310397633Google Scholar
Evans, R. C., Twedell, E. L., Zhu, M., Ascencio, J., Zhang, R., & Khaliq, Z. M. (2020). Functional dissection of basal ganglia inhibitory inputs onto substantia nigra dopaminergic neurons. Cell Reports, 32(11), 108156. https://doi.org/10.1016/j.celrep.2020.108156Google Scholar
Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science, 306(5703), 19401943. https://doi.org/10.1126/science.1102941Google Scholar
Franklin, N. T., & Frank, M. J. (2018). Compositional clustering in task structure learning. PLoS Computational Biology, 14(4), e1006116. https://doi.org/10.1371/journal.pcbi.1006116Google Scholar
Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A., & Ondobaka, S. (2017). Active inference, curiosity and insight. Neural Computation, 29(10), 26332683. https://doi.org/10.1162/neco_a_00999Google Scholar
Fujimoto, A., & Takahashi, H. (2016). Flexible modulation of risk attitude during decision-making under quota. Neuroimage (online). https://doi.org/10.1016/j.neuroimage.2016.06.040CrossRefGoogle Scholar
Fujimoto, A., Tsurumi, K., Kawada, R., et al. (2017). Deficit of state-dependent risk attitude modulation in gambling disorder. Translational Psychiatry, 7(4), e1085. https://doi.org/10.1038/tp.2017.55CrossRefGoogle ScholarPubMed
Gerfen, C. R. (1984). The neostriatal mosaic: compartmentalization of corticostriatal input and striatonigral output systems. Nature, 311(5985), 461464. https://doi.org/10.1038/311461a0CrossRefGoogle ScholarPubMed
Gerfen, C. R. (1992). The neostriatal mosaic: multiple levels of compartmental organization in the basal ganglia. Annual Review of Neuroscience, 15, 285320. https://doi.org/10.1146/annurev.ne.15.030192.001441Google Scholar
Gerfen, C. R., Engber, T. M., Mahan, L. C., et al. (1990). D1 and D2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science, 250(4986), 14291432. https://doi.org/10.1126/science.2147780Google Scholar
Gershman, S. J. (2015). A unifying probabilistic view of associative learning. PLoS Computational Biology, 11(11), e1004567. https://doi.org/10.1371/journal.pcbi.1004567Google Scholar
Gershman, S. J., Blei, D. M., & Niv, Y. (2010). Context, learning, and extinction. Psychological Review, 117(1), 197209. https://doi.org/10.1037/a0017808Google Scholar
Glimcher, P. W., & Fehr, E. (2013). Neuroeconomics: Decision Making and the Brain (2nd ed.). London: Elsevier.Google Scholar
Graybiel, A. M. (1991). Basal ganglia: input, neural activity, and relation to the cortex. Current Opinion in Neurobiology, 1(4), 644651. https://doi.org/10.1016/s0959-4388(05)80043-1Google Scholar
Graybiel, A. M., & Ragsdale, C. W., Jr. (1978). Histochemically distinct compartments in the striatum of human, monkeys, and cat demonstrated by acetylthiocholinesterase staining. Proceedings of the National Academy of Sciences, 75(11), 57235726. https://doi.org/10.1073/pnas.75.11.5723Google Scholar
Haber, S. N., Fudge, J. L., & McFarland, N. R. (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. Journal of Neuroscience, 20(6), 23692382. www.jneurosci.org/content/20/6/2369.full.pdfCrossRefGoogle ScholarPubMed
Haber, S. N., & Knutson, B. (2010). The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology, 35(1), 426. https://doi.org/10.1038/npp.2009.129Google Scholar
Hamid, A. A., Frank, M. J., & Moore, C. I. (2021). Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell, 184(10), P27332749.E16. https://doi.org/10.1016/j.cell.2021.03.046CrossRefGoogle Scholar
Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks, 19(8), 12421254. https://doi.org/10.1016/j.neunet.2006.06.007Google Scholar
Haruno, M., Wolpert, D. M., & Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13(10), 22012220. https://doi.org/10.1162/089976601750541778Google Scholar
Hasselmo, M. E. (1999). Neuromodulation: acetylcholine and memory consolidation. Trends in Cognitive Sciences, 3(9), 351359.Google Scholar
Hauert, C., Traulsen, A., Brandt, H., Nowak, M. A., & Sigmund, K. (2007). Via freedom to coercion: the emergence of costly punishment. Science, 316(5833), 19051907. https://doi.org/10.1126/science.1141588Google Scholar
Hikida, T., Kimura, K., Wada, N., Funabiki, K., & Nakanishi, S. (2010). Distinct roles of synaptic transmission in direct and indirect striatal pathways to reward and aversive behavior. Neuron, 66(6), 896907. https://doi.org/10.1016/j.neuron.2010.05.011Google Scholar
Hilbe, C., Simsa, S., Chatterjee, K., & Nowak, M. A. (2018). Evolution of cooperation in stochastic games. Nature, 559, 246–249. https://doi.org/10.1038/s41586-018-0277-xGoogle Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780. https://doi.org/10.1162/neco.1997.9.8.1735Google Scholar
Hoover, J. E., & Strick, P. L. (1993). Multiple output channels in the basal ganglia. Science, 259(5096), 819821. https://doi.org/10.1126/science.7679223Google Scholar
Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In Houk, J. C., Davis, J. L., & Beiser, D. G. (Eds.), Models of Information Processing in the Basal Ganglia (pp. 249270). Cambridge, MA: MIT Press.Google Scholar
Hu, H., Cui, Y., & Yang, Y. (2020). Circuits and functions of the lateral habenula in health and in disease. Nature Reviews Neuroscience, 21, 277–295. https://doi.org/10.1038/s41583-020-0292-4CrossRefGoogle Scholar
Huys, Q. J., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., & Roiser, J. P. (2012). Bonsai trees in your head: how the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8(3), e1002410. https://doi.org/10.1371/journal.pcbi.1002410Google Scholar
Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 319. https://doi.org/10.1038/s41386-020-0746-4Google Scholar
Iigaya, K., Fonseca, M. S., Murakami, M., Mainen, Z. F., & Dayan, P. (2018). An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nature Communications, 9(1), 2477. https://doi.org/10.1038/s41467-018-04840-2Google Scholar
Ito, M., & Doya, K. (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opinion in Neurobiology, 21(3), 368373. https://doi.org/10.1016/j.conb.2011.04.001Google Scholar
Ito, M., & Doya, K. (2015a). Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. Journal of Neuroscience, 35(8), 34993514. https://doi.org/10.1523/JNEUROSCI.1962-14.2015Google Scholar
Ito, M., & Doya, K. (2015b). Parallel representation of value-based and finite state-based strategies in the ventral and dorsal striatum. PLoS Computational Biology, 11(11), e1004540. https://doi.org/10.1371/journal.pcbi.1004540Google Scholar
Kahneman, D., & Tversky, A. (1979). Prospect theory: an analysis of decision under risk. Econometrica, 47(2), 263291.Google Scholar
Kakade, S., & Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks, 15, 549559.Google Scholar
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of ASME, 82-D, 3545.Google Scholar
Kalman, R. E., & Koepcke, R. W. (1958). Optimal synthesis of linear sampling control systems using general performance indexes. Transactions of ASME, 80, 18201826.Google Scholar
Kaplan, F., & Oudeyer, P.-Y. (2007). In search of the neural circuits of intrinsic motivation. Frontiers in Neuroscience, 1(1), 225236. https://doi.org/10.3389/neuro.01.1.1.017.2007Google Scholar
Kappen, H. J., Gómez, V., & Opper, M. (2012). Optimal control as a graphical model inference problem. Machine Learning, 87(2), 159182. https://doi.org/10.1007/s10994-012-5278-7CrossRefGoogle Scholar
Kim, H. R., Malik, A. N., Mikhael, J. G., et al. (2020). A unified framework for dopamine signals across timescales. Cell, 183(6), 16001616, e1625. https://doi.org/10.1016/j.cell.2020.11.013Google Scholar
Kravitz, A. V., Tye, L. D., & Kreitzer, A. C. (2012). Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nature Neuroscience, 15(6), 816818. https://doi.org/10.1038/nn.3100CrossRefGoogle ScholarPubMed
Kurth-Nelson, Z., & Redish, A. D. (2009). Temporal-difference reinforcement learning with distributed representations. PLoS One, 4(10), e7362. https://doi.org/10.1371/journal.pone.0007362Google Scholar
Laibson, D. I. (1997). Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics, 62, 443477.Google Scholar
Langdon, A. J., Sharpe, M. J., Schoenbaum, G., & Niv, Y. (2018). Model-based predictions for dopamine. Current Opinion in Neurobiology, 49, 17. https://doi.org/10.1016/j.conb.2017.10.006Google Scholar
Langdon, A. J., Song, M., & Niv, Y. (2019). Uncovering the “state”: tracing the hidden state representations that structure learning and decision-making. Behavioural Processes, 167, 103891. https://doi.org/10.1016/j.beproc.2019.103891CrossRefGoogle ScholarPubMed
Levine, S. (2018). Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv, 1805.00909Google Scholar
Levy, D. J., & Glimcher, P. W. (2011). Comparing apples and oranges: using reward-specific and reward-general subjective value representation in the brain. Journal of Neuroscience, 31(41), 1469314707. https://doi.org/10.1523/JNEUROSCI.2218-11.2011Google Scholar
Li, Y., Zhong, W., Wang, D., et al. (2016). Serotonin neurons in the dorsal raphe nucleus encode reward signals. Nature Communications, 7, 10503. https://doi.org/10.1038/ncomms10503Google Scholar
Liu, Z., Zhou, J., Li, Y., et al. (2014). Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron, 81(6), 13601374. https://doi.org/10.1016/j.neuron.2014.02.010Google Scholar
Lowet, A. S., Zheng, Q., Matias, S., Drugowitsch, J., & Uchida, N. (2020). Distributional reinforcement learning in the brain. Trends in Neurosciences, 43(12), 980–997. https://doi.org/10.1016/j.tins.2020.09.004Google Scholar
Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370396. https://doi.org/10.1037/h0054346Google Scholar
Mathys, C., Daunizeau, J., Friston, K. J., & Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Frontiers in Human Neuroscience, 5, 39. https://doi.org/10.3389/fnhum.2011.00039CrossRefGoogle ScholarPubMed
Matias, S., Lottem, E., Dugue, G. P., & Mainen, Z. F. (2017). Activity patterns of serotonin neurons underlying cognitive flexibility. Elife, 6 (online). https://doi.org/10.7554/eLife.20552Google Scholar
Matsumoto, M., & Hikosaka, O. (2007). Lateral habenula as a source of negative reward signals in dopamine neurons. Nature, 447(7148), 11111115. https://doi.org/10.1038/nature05860Google Scholar
Matsumoto, M., & Hikosaka, O. (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459(7248), 837841. https://doi.org/10.1038/nature08028Google Scholar
Menegas, W., Akiti, K., Amo, R., Uchida, N., & Watabe-Uchida, M. (2018). Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature Neuroscience, 21, 1421–1430. https://doi.org/10.1038/s41593-018-0222-1Google Scholar
Miyazaki, K., Miyazaki, K. W., Sivori, G., Yamanaka, A., Tanaka, K. F., & Doya, K. (2020). Serotonergic projections to the orbitofrontal and medial prefrontal cortices differentially modulate waiting for future rewards. Science Advances, 6(48), eabc7246. https://doi.org/10.1126/sciadv.abc7246Google Scholar
Miyazaki, K., Miyazaki, K. W., Yamanaka, A., Tokuda, T., Tanaka, K. F., & Doya, K. (2018). Reward probability and timing uncertainty alter the effect of dorsal raphe serotonin neurons on patience. Nature Communications, 9(1), 2048. https://doi.org/10.1038/s41467-018-04496-yGoogle Scholar
Miyazaki, K. W., Miyazaki, K., Tanaka, K. F., et al. (2014). Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Current Biology, 24(17), 20332040. https://doi.org/10.1016/j.cub.2014.07.041Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529533. https://doi.org/10.1038/nature14236CrossRefGoogle ScholarPubMed
Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry. Trends in Cognitive Sciences, 16(1), 7280. https://doi.org/10.1016/j.tics.2011.11.018Google Scholar
Mordatch, I., & Abbeel, P. (2017). Emergence of grounded compositional language in multi-agent populations. https://arxiv.org/abs/1703.04908Google Scholar
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 3751. https://doi.org/10.1016/S0921-8890(01)00113-0CrossRefGoogle Scholar
Muelling, K., Boularias, A., Mohler, B., Scholkopf, B., & Peters, J. (2014). Learning strategies in table tennis using inverse reinforcement learning. Biological Cybernetics (online). https://doi.org/10.1007/s00422-014-0599-1Google Scholar
Mukherjee, D., Lee, S., Kazinka, R., & Kable, J. W. (2020). Multiple facets of value-based decision making in major depressive disorder. Scientific Reports, 10(1), 3415. https://doi.org/10.1038/s41598-020-60230-zGoogle Scholar
Munuera, J., Rigotti, M., & Salzman, C. D. (2018). Shared neural coding for social hierarchy and reward value in primate amygdala. Nature Neuroscience, 21(3), 415423. https://doi.org/10.1038/s41593-018-0082-8Google Scholar
Nagai, Y., Takayama, K., Nishitani, N., et al. (2020). The role of dorsal raphe serotonin neurons in the balance between reward and aversion. International Journal of Molecular Sciences, 21(6). https://doi.org/10.3390/ijms21062160Google Scholar
Nakahara, H., Doya, K., & Hikosaka, O. (2001). Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuo-motor sequences: a computational approach. Journal of Cognitive Neuroscience, 13(5), 626647. https://doi.org/10.1162/089892901750363208Google Scholar
Nassar, M. R., Wilson, R. C., Heasly, B., & Gold, J. I. (2010). An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience, 30(37), 1236612378. https://doi.org/10.1523/JNEUROSCI.0822-10.2010CrossRefGoogle Scholar
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In 17th International Conference on Machine Learning.Google Scholar
Nishijo, H., Ono, T., & Nishino, H. (1988). Topographic distribution of modality-specific amygdalar neurons in alert monkey. Journal of Neuroscience, 8(10), 35563569. https://doi.org/10.1523/jneurosci.08-10-03556.1988Google Scholar
Ohmura, Y., Iwami, K., Chowdhury, S., et al. (2021). Disruption of model-based decision making by silencing of serotonin neurons in the dorsal raphe nucleus. Current Biology, 31(11), 2446–2454. https://doi.org/10.1016/j.cub.2021.03.048Google Scholar
Ohtsuki, H., Hauert, C., Lieberman, E., & Nowak, M. A. (2006). A simple rule for the evolution of cooperation on graphs and social networks. Nature, 441(7092), 502505. https://doi.org/10.1038/nature04605Google Scholar
Ohtsuki, H., Iwasa, Y., & Nowak, M. A. (2009). Indirect reciprocity provides only a narrow margin of efficiency for costly punishment. Nature, 457(7225), 7982. https://doi.org/10.1038/nature07601Google Scholar
Pabba, M. (2013). Evolutionary development of the amygdaloid complex. Frontiers in Neuroanatomy, 7, 27. https://doi.org/10.3389/fnana.2013.00027CrossRefGoogle ScholarPubMed
Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6, 8096. https://doi.org/10.1038/ncomms9096CrossRefGoogle ScholarPubMed
Palminteri, S., & Pessiglione, M. (2017). Opponent brain systems for reward and punishment learning: causal evidence from drug and lesion studies in humans. Decision Neuroscience, 2017, 291–303. https://doi.org/10.1016/B978-0-12-805308-9.00023-3Google Scholar
Parr, T., & Friston, K. J. (2017). Uncertainty, epistemics and active inference. Journal of the Royal Society Interface, 14(136). https://doi.org/10.1098/rsif.2017.0376Google Scholar
Pearce, J. M., & Bouton, M. E. (2001). Theories of associative learning in animals. Annual Review of Psychology, 52, 111139. https://doi.org/10.1146/annurev.psych.52.1.111Google Scholar
Redgrave, P., Prescott, T. J., & Gurney, K. (1999). Is the short-latency dopamine response too short to signal reward error? Trends in Neuroscience, 22(4), 146151. https://doi.org/10.1016/s0166-2236(98)01373-3Google Scholar
Redish, A. D. (2004). Addiction as a computational process gone awry. Science, 306, 19441947.Google Scholar
Redish, A. D., & Gordon, J. A. (2016). Computational Psychiatry. Cambridge, MA: MIT Press. https://doi.org/10.7551/mitpress/9780262035422.001.0001Google Scholar
Reiss, S. (2012). Intrinsic and extrinsic motivation. Teaching of Psychology, 39(2), 152156. https://doi.org/10.1177/0098628312437704Google Scholar
Safra, L., Chevallier, C., & Palminteri, S. (2019). Depressive symptoms are associated with blunted reward learning in social contexts. PLoS Computational Biology, 15(7), e1007224. https://doi.org/10.1371/journal.pcbi.1007224Google Scholar
Sales, A. C., Friston, K. J., Jones, M. W., Pickering, A. E., & Moran, R. J. (2019). Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: an active inference model. PLoS Computational Biology, 15(1), e1006267. https://doi.org/10.1371/journal.pcbi.1006267Google Scholar
Samejima, K., & Doya, K. (2007). Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of the New York Academy of Sciences, 1104, 213228. https://doi.org/10.1196/annals.1390.024Google Scholar
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 127.Google Scholar
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 15931599. https://doi.org/10.1126/science.275.5306.1593Google Scholar
Schweighofer, N., & Doya, K. (2003). Meta-learning of reinforcement learning. Neural Networks, 16(1), 59. https://doi.org/10.1016/S0893-6080(02)00228-9Google Scholar
Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8(3/4), 323340. https://doi.org/10.1023/A:1022680823223Google Scholar
Sippy, T., Lapray, D., Crochet, S., & Petersen, C. C. (2015). Cell-type-specific sensorimotor processing in striatal projection neurons during goal-directed behavior. Neuron, 88(2), 298–305. https://doi.org/10.1016/j.neuron.2015.08.039Google Scholar
Soma, M., Aizawa, H., Ito, Y., et al. (2009). Development of the mouse amygdala as revealed by enhanced green fluorescent protein gene transfer by means of in utero electroporation. Journal of Comparative Neurology, 513(1), 113128. https://doi.org/10.1002/cne.21945Google Scholar
Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 16431653. https://doi.org/10.1038/nn.4650Google Scholar
Starkweather, C. K., & Uchida, N. (2021). Dopamine signals as temporal difference errors: recent advances. Current Opinion in Neurobiology, 67, 95105. https://doi.org/10.1016/j.conb.2020.08.014Google Scholar
Sugimoto, N., Haruno, M., Doya, K., & Kawato, M. (2012). MOSAIC for multiple-reward environments. Neural Computation, 24(3), 577606. https://doi.org/10.1162/NECO_a_00246Google Scholar
Sun, R. (2009). Motivational representations within a computational cognitive architecture. Cognitive Computation, 1(1), 91103. https://doi.org/10.1007/s12559-009-9005-zGoogle Scholar
Sun, R., & Sessions, C. (2000). Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors. IEEE Transactions on Systems, Man, and Cybernetics, 30(3), 403418. https://doi.org/10.1109/3477.846230Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). Cambridge, MA: MIT Press.Google Scholar
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181211. https://doi.org/10.1016/s0004-3702(99)00052-1Google Scholar
Takahashi, H. (2012). Monoamines and assessment of risks. Current Opinion in Neurobiology, 22(6), 10621067. https://doi.org/10.1016/j.conb.2012.06.003Google Scholar
Takahashi, H., Fujie, S., Camerer, C., et al. (2013). Norepinephrine in the brain is associated with aversion to financial loss. Molecular Psychiatry, 18(1), 34. https://doi.org/10.1038/mp.2012.7Google Scholar
Takeuchi, H., Kawada, R., Tsurumi, K., et al. (2015). Heterogeneity of loss aversion in pathological gambling. Journal of Gambling Studies, 32, 1143–1154. https://doi.org/10.1007/s10899-015-9587-1Google Scholar
Takeuchi, H., Tsurumi, K., Murao, T., et al. (2017). Common and differential brain abnormalities in gambling disorder subtypes based on risk attitude. Addictive Behaviors, 69, 4854. https://doi.org/10.1016/j.addbeh.2017.01.025Google Scholar
Tanaka, S. C., Yahata, N., Todokoro, A., et al. (2018). Preliminary evidence of altered neural response during intertemporal choice of losses in adult attention-deficit hyperactivity disorder. Scientific Reports, 8(1), 6703. https://doi.org/10.1038/s41598-018-24944-5Google Scholar
Tecuapetla, F., Jin, X., Lima, S. Q., & Costa, R. M. (2016). Complementary contributions of striatal projection pathways to action initiation and execution. Cell, 166(3), 703715. https://doi.org/10.1016/j.cell.2016.06.032CrossRefGoogle ScholarPubMed
Thrun, S., & Pratt, L. (Eds.). (1998). Learning to Learn. New York, NY: Springer. https://doi.org/10.1007/978-1-4615-5529-2.Google Scholar
Todorov, E. (2008). General duality between optimal control and estimation. In The 47th IEEE Conference on Decision and Control.Google Scholar
Todorov, E. (2009). Parallels between sensory and motor information processing. In M. S. Gazzaniga (Ed.), The Cognitive Neurosciences, 4th ed. Cambridge, MA: MIT Press.Google Scholar
Uchibe, E. (2017). Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters, 47, 891–905. https://doi.org/10.1007/s11063-017-9702-7Google Scholar
Uchibe, E., & Doya, K. (2014). Inverse reinforcement learning using Dynamic Policy Programming. In 4th International Conference on Development and Learning and on Epigenetic Robotics.Google Scholar
Uchibe, E., & Doya, K. (2021). Forward and inverse reinforcement learning sharing network weights and hyperparameters. Neural Networks, 144, 138153. https://doi.org/10.1016/j.neunet.2021.08.017Google Scholar
van den Bos, W., Talwar, A., & McClure, S. M. (2013). Neural correlates of reinforcement learning and social preferences in competitive bidding. Journal of Neuroscience, 33(5), 21372146. https://doi.org/10.1523/JNEUROSCI.3095-12.2013Google Scholar
von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press.Google Scholar
Voorn, P., Vanderschuren, L. J., Groenewegen, H. J., Robbins, T. W., & Pennartz, C. M. (2004). Putting a spin on the dorsal-ventral divide of the striatum. Trends in Neuroscience, 27(8), 468474. https://doi.org/10.1016/j.tins.2004.06.006Google Scholar
Wang, J. X., Kurth-Nelson, Z., Kumaran, D., et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21(6), 860868. https://doi.org/10.1038/s41593-018-0147-8Google Scholar
Watabe-Uchida, M., Eshel, N., & Uchida, N. (2017). Neural circuitry of reward prediction error. Annual Review of Neuroscience, 40, 373394. https://doi.org/10.1146/annurev-neuro-072116-031109Google Scholar
Wiering, M., & Schmidhuber, J. (1998). HQ-learning. Adaptive Behavior, 6, 219246.Google Scholar
Yamagata, N., Ichinose, T., Aso, Y., et al. (2014). Distinct dopamine neurons mediate reward signals for short- and long-term memories. Proceedings of the National Academy of Sciences, 112(2), 578–583. https://doi.org/10.1073/pnas.1421930112Google Scholar
Yamaguchi, S., Naoki, H., Ikeda, M., et al. (2018). Identification of animal behavioral strategies by inverse reinforcement learning. PLoS Computational Biology, 14(5), e1006122. https://doi.org/10.1371/journal.pcbi.1006122Google Scholar
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T., & Wang, X. J. (2019). Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience, 22(2), 297306. https://doi.org/10.1038/s41593-018-0310-2Google Scholar
Yoshida, W., Dolan, R. J., & Friston, K. J. (2008). Game theory of mind. PLoS Computational Biology, 4(12), e1000254. https://doi.org/10.1371/journal.pcbi.1000254Google Scholar
Yoshizawa, T., Ito, M., & Doya, K. (2018). Reward-predictive neural activities in striatal striosome compartments. eNeuro, 5(1), e0367–0317.2018. https://doi.org/10.1523/ENEURO.0367-17.2018Google Scholar
Yu, A. J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46(4), 681692. https://doi.org/10.1016/j.neuron.2005.04.026Google Scholar
Ziebart, B., Bagnell, J., & Dey, A. (2010). Modeling interaction via the principle of maximum causal entropy. In International Conference on Machine Learning.Google Scholar
Ziebart, B., Maas, A., Bagnell, J., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2008).Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×