Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-26T04:13:34.942Z Has data issue: false hasContentIssue false

Building machines that learn and think for themselves

Published online by Cambridge University Press:  10 November 2017

Matthew Botvinick
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
David G. T. Barrett
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Peter Battaglia
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Nando de Freitas
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Darshan Kumaran
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joel Z Leibo
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Timothy Lillicrap
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joseph Modayil
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shakir Mohamed
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Neil C. Rabinowitz
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Danilo J. Rezende
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Adam Santoro
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Tom Schaul
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Christopher Summerfield
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Greg Wayne
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Theophane Weber
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Daan Wierstra
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shane Legg
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Demis Hassabis
Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com

Abstract

We agree with Lake and colleagues on their list of “key ingredients” for building human-like intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here, we survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some outstanding challenges.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B. & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3981–89). Neural Information Processing Systems.Google Scholar
Battaglia, P., Pascanu, R., Lai, M. & Rezende, D. J. (2016) Interaction networks for learning about objects, relations and physics. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 4502–10. Neural Information Processing Systems.Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R. (2016) Unifying count-based exploration and intrinsic motivation. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 1471–79. Neural Information Processing Systems.Google Scholar
Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., Rae, J., Wierstra, D. & Hassabis, D. (2016) Model-free episodic control. arXiv preprint 1606.04460. Available at: https://arxiv.org/abs/1606.04460.Google Scholar
Botvinick, M. M. & Cohen, J. D. (2014) The computational and neural basis of cognitive control: Charted territory and new frontiers. Cognitive Science 38:1249–85.CrossRefGoogle ScholarPubMed
Botvinick, M., Weinstein, A., Solway, A. & Barto, A. (2015) Reinforcement learning, efficient coding, and the statistics of natural tasks. Current Opinion in Behavioral Sciences 5:7177.CrossRefGoogle Scholar
Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P. & de Freitas, N. (2016). Learning to perform physics experiments via deep reinforcement learning. arXiv preprint:1611.01843. Available at: https://arxiv.org/abs/1611.01843.Google Scholar
Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I. & Abbeel, P. (2016) RL2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint 1611.02779. Available at: https://arxiv.org/pdf/1703.07326.pdf.Google Scholar
Eslami, S. M., Heess, N., Weber, T., Tassa, Y., Kavukcuoglu, K. & Hinton, G. E. (2016) Attend, infer, repeat: Fast scene understanding with generative models. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3225–33. Neural Information Processing Systems Foundation.Google Scholar
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kayukcuoglu, K. & Hassabis, D. (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–76.CrossRefGoogle ScholarPubMed
Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N. & Battaglia, P. W. (2017) Metacontrol for adaptive imagination-based optimization. In: Proceedings of the 5th International Conference on Learning Representations (ICLR).Google Scholar
Hochreiter, S. A., Younger, S. & Conwell, P. R. (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Network—ICANN 2001, ed. Dorffner, G., Bischoff, H. & Hornik, K., pp. 8794. Springer.CrossRefGoogle Scholar
Kahneman, D. (2011) Thinking, fast and slow. Macmillan.Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Presented at the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, December 3–6, 2012. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), ed. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q., pp. 1097–105. Neural Information Processing Systems Foundation.Google Scholar
Lake, B. M., Lawrence, N. D. & Tenenbaum, J. B. (2016) The emergence of organizing structure in conceptual representation. arXiv preprint 1611.09384. Available at: http://arxiv.org/abs/1611.09384.Google Scholar
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015a) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–38.CrossRefGoogle ScholarPubMed
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglous, I., King, H., Kumaran, D., Wierstra, D. & Hassabis, D. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–33.CrossRefGoogle ScholarPubMed
Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R. & Chopra, S. (2016) Video (language) modeling: A baseline for generative models of natural videos. arXiv preprint 1412.6604. Available at: https://www.google.com/search?q=arXiv+preprint+1412.6604&ie=utf-8&oe=utf-8.Google Scholar
Raposo, D., Santoro, A., Barrett, D. G. T., Pascanu, R., Lillicrap, T. & Battaglia, P. (2017) Discovering objects and their relations from entangled scene representations. Presented at the Workshop Track at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. arXiv preprint 1702.05068. Available at: https://openreview.net/pdf?id=Bk2TqVcxe.Google Scholar
Ravi, S. & Larochelle, H. (2017) Optimization as a model for few-shot learning. Presented at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. Available at: https://openreview.net/pdf?id=rJY0-Kcll.Google Scholar
Reed, S. & de Freitas, N. (2016) Neural programmer-interpreters. Presented at the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2–5, 2016. arXiv preprint 1511.06279. Available at: https://arxiv.org/abs/1511.06279.Google Scholar
Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. (2016) One-shot generalization in deep generative models. Presented at the International Conference on Machine Learning, New York, NY, June 20–22, 2016. Proceedings of Machine Learning Research 48:1521–29.Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY, June 19–24, 2016. Proceedings of Machine Learning Research 48:1842–50.Google Scholar
Schaul, T., Quan, J., Antonoglou, I. & Silver, D. (2016) Prioritized experience replay. Presented at International Conference on Learning Representations (ICLR), San Diego, CA, May 7–9, 2015. arXiv preprint 1511.05952. Available at: https://arxiv.org/abs/1511.05952.Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K, Graepel, T. & Hassabis, D. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7585):484–89.CrossRefGoogle ScholarPubMed
Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G. Reichert, D., Rabinowitz, N., Barreto, A. & Degris, T. (2017) The predictron: End-to-end learning and planning. In: Proceedings of the 34rd International Conference on Machine Learning, Sydney, Australia, ed. Balcan, M. F. & Weinberger, K. Q..Google Scholar
van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY. Proceedings of Machine Learning Research 48:1747–56.Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. (2016) Matching networks for one shot learning. Vinyals, O., Blundell, C., Lillicrap, T. Kavukcuoglu, K. & Wierstra, D. (2016). Matching networks for one shot learning. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3630–38. Neural Information Processing Systems Foundation.Google Scholar
Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D. & Botvinick, M. (2017). Learning to reinforcement learn. In: Presented at the 39th Annual Meeting of the Cognitive Science Society, London, July 26–29, 2017. arXiv preprint 1611.05763. Available at: https://arxiv.org/abs/1611.05763.Google Scholar