Building machines that learn and think for themselves

Matthew Botvinick; David G. T. Barrett; Peter Battaglia; Nando de Freitas; Darshan Kumaran; Joel Z Leibo; Timothy Lillicrap; Joseph Modayil; Shakir Mohamed; Neil C. Rabinowitz; Danilo J. Rezende; Adam Santoro; Tom Schaul; Christopher Summerfield; Greg Wayne; Theophane Weber; Daan Wierstra; Shane Legg; Demis Hassabis

doi:10.1017/S0140525X17000048

Building machines that learn and think for themselves

Published online by Cambridge University Press: 10 November 2017

Matthew Botvinick ,

David G. T. Barrett ,

Shakir Mohamed and

Matthew Botvinick: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
David G. T. Barrett: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Peter Battaglia: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Nando de Freitas: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Darshan Kumaran: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joel Z Leibo: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Timothy Lillicrap: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joseph Modayil: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shakir Mohamed: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Neil C. Rabinowitz: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Danilo J. Rezende: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Adam Santoro: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Tom Schaul: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Christopher Summerfield: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Greg Wayne: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Theophane Weber: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Daan Wierstra: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shane Legg: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Demis Hassabis: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We agree with Lake and colleagues on their list of “key ingredients” for building human-like intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here, we survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some outstanding challenges.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 40 , 2017 , e255

DOI: https://doi.org/10.1017/S0140525X17000048 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B. & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3981–89). Neural Information Processing Systems.Google Scholar

Battaglia, P., Pascanu, R., Lai, M. & Rezende, D. J. (2016) Interaction networks for learning about objects, relations and physics. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 4502–10. Neural Information Processing Systems.Google Scholar

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R. (2016) Unifying count-based exploration and intrinsic motivation. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 1471–79. Neural Information Processing Systems.Google Scholar

Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., Rae, J., Wierstra, D. & Hassabis, D. (2016) Model-free episodic control. arXiv preprint 1606.04460. Available at: https://arxiv.org/abs/1606.04460.Google Scholar

Botvinick, M. M. & Cohen, J. D. (2014) The computational and neural basis of cognitive control: Charted territory and new frontiers. Cognitive Science 38:1249–85.CrossRef Google Scholar PubMed

Botvinick, M., Weinstein, A., Solway, A. & Barto, A. (2015) Reinforcement learning, efficient coding, and the statistics of natural tasks. Current Opinion in Behavioral Sciences 5:71–77.CrossRef Google Scholar

Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P. & de Freitas, N. (2016). Learning to perform physics experiments via deep reinforcement learning. arXiv preprint:1611.01843. Available at: https://arxiv.org/abs/1611.01843.Google Scholar

Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I. & Abbeel, P. (2016) RL²: Fast reinforcement learning via slow reinforcement learning. arXiv preprint 1611.02779. Available at: https://arxiv.org/pdf/1703.07326.pdf.Google Scholar

Eslami, S. M., Heess, N., Weber, T., Tassa, Y., Kavukcuoglu, K. & Hinton, G. E. (2016) Attend, infer, repeat: Fast scene understanding with generative models. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3225–33. Neural Information Processing Systems Foundation.Google Scholar

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kayukcuoglu, K. & Hassabis, D. (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–76.CrossRef Google Scholar PubMed

Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N. & Battaglia, P. W. (2017) Metacontrol for adaptive imagination-based optimization. In: Proceedings of the 5th International Conference on Learning Representations (ICLR).Google Scholar

Hochreiter, S. A., Younger, S. & Conwell, P. R. (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Network—ICANN 2001, ed. Dorffner, G., Bischoff, H. & Hornik, K., pp. 87–94. Springer.CrossRef Google Scholar

Kahneman, D. (2011) Thinking, fast and slow. Macmillan.Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Presented at the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, December 3–6, 2012. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), ed. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q., pp. 1097–105. Neural Information Processing Systems Foundation.Google Scholar

Lake, B. M., Lawrence, N. D. & Tenenbaum, J. B. (2016) The emergence of organizing structure in conceptual representation. arXiv preprint 1611.09384. Available at: http://arxiv.org/abs/1611.09384.Google Scholar

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015a) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–38.CrossRef Google Scholar PubMed

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglous, I., King, H., Kumaran, D., Wierstra, D. & Hassabis, D. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–33.CrossRef Google Scholar PubMed

Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R. & Chopra, S. (2016) Video (language) modeling: A baseline for generative models of natural videos. arXiv preprint 1412.6604. Available at: https://www.google.com/search?q=arXiv+preprint+1412.6604&ie=utf-8&oe=utf-8.Google Scholar

Raposo, D., Santoro, A., Barrett, D. G. T., Pascanu, R., Lillicrap, T. & Battaglia, P. (2017) Discovering objects and their relations from entangled scene representations. Presented at the Workshop Track at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. arXiv preprint 1702.05068. Available at: https://openreview.net/pdf?id=Bk2TqVcxe.Google Scholar

Ravi, S. & Larochelle, H. (2017) Optimization as a model for few-shot learning. Presented at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. Available at: https://openreview.net/pdf?id=rJY0-Kcll.Google Scholar

Reed, S. & de Freitas, N. (2016) Neural programmer-interpreters. Presented at the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2–5, 2016. arXiv preprint 1511.06279. Available at: https://arxiv.org/abs/1511.06279.Google Scholar

Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. (2016) One-shot generalization in deep generative models. Presented at the International Conference on Machine Learning, New York, NY, June 20–22, 2016. Proceedings of Machine Learning Research 48:1521–29.Google Scholar

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY, June 19–24, 2016. Proceedings of Machine Learning Research 48:1842–50.Google Scholar

Schaul, T., Quan, J., Antonoglou, I. & Silver, D. (2016) Prioritized experience replay. Presented at International Conference on Learning Representations (ICLR), San Diego, CA, May 7–9, 2015. arXiv preprint 1511.05952. Available at: https://arxiv.org/abs/1511.05952.Google Scholar

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K, Graepel, T. & Hassabis, D. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7585):484–89.CrossRef Google Scholar PubMed

Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G. Reichert, D., Rabinowitz, N., Barreto, A. & Degris, T. (2017) The predictron: End-to-end learning and planning. In: Proceedings of the 34rd International Conference on Machine Learning, Sydney, Australia, ed. Balcan, M. F. & Weinberger, K. Q..Google Scholar

van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY. Proceedings of Machine Learning Research 48:1747–56.Google Scholar

Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. (2016) Matching networks for one shot learning. Vinyals, O., Blundell, C., Lillicrap, T. Kavukcuoglu, K. & Wierstra, D. (2016). Matching networks for one shot learning. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3630–38. Neural Information Processing Systems Foundation.Google Scholar

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D. & Botvinick, M. (2017). Learning to reinforcement learn. In: Presented at the 39th Annual Meeting of the Cognitive Science Society, London, July 26–29, 2017. arXiv preprint 1611.05763. Available at: https://arxiv.org/abs/1611.05763.Google Scholar