Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-15T17:20:40.109Z Has data issue: false hasContentIssue false

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

Published online by Cambridge University Press:  11 February 2016

Abdel Rodríguez
Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be
Peter Vrancx
Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be
Ricardo Grau
Affiliation:
Center of Studies in Informatics, Universidad Central ‘Marta Abreu’ de Las Villas – Carretera a Camajuaní Km 5, 50100 Villa Clara, Cuba e-mail: rgrau@uclv.edu.cu
Ann Nowé
Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be

Abstract

Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.

Type
Articles
Copyright
© Cambridge University Press, 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bush, R. & Mosteller, F. 1955. Stochastic Models for Learning. Wiley.CrossRefGoogle Scholar
Castelletti, A., Pianosi, F. & Restelli, M. 2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1–8. IEEE.CrossRefGoogle Scholar
Claus, C. & Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746–752.Google Scholar
Hilgard, E. 1948. Theories of Learning. Appleton-Century-Crofts.CrossRefGoogle Scholar
Hilgard, E. & Bower, B. 1966. Theories of Learning. Prentice Hall.Google Scholar
Howell, M. & Best, M. 2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice 8(2), 147154.CrossRefGoogle Scholar
Howell, M., Frost, G., Gordon, T. & Wu, Q 1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics 7(3), 263276.CrossRefGoogle Scholar
Kapetanakis, S., Kudenko, D. & Strens, M. 2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004.Google Scholar
Parzen, E. 1960. Modern Probability Theory And Its Applications, Wiley Classics Edition. Wiley-Interscience.CrossRefGoogle Scholar
Rodríguez, A., Grau, R. & Nowé, A. 2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473–478.Google Scholar
Thathachar, M. & Sastry, P. 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers.CrossRefGoogle Scholar
Tsetlin, M. 1961. The behavior of finite automata in random media. Avtomatika i Telemekhanika 22, 13451354.Google Scholar
Tsetlin, M. 1962. The behavior of finite automata in random media. Avtomatika i Telemekhanika 22, 12101219.Google Scholar
Tsypkin, Y. 1971. Adaptation and Learning in Automatic systems. Academic Press.Google Scholar
Tsypkin, Y. 1973. Foundations of the Theory of Learning Systems. Academic Press.Google Scholar
Veelen, M. & Spreij, P. 2009. Evolution in games with a continuous action space. Economic Theory 39(3), 355376.CrossRefGoogle Scholar
Verbeeck, K. 2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September.Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M. & Lewis, F. 2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2(45), 477484.CrossRefGoogle Scholar