Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-15T04:37:31.691Z Has data issue: false hasContentIssue false

A utility-based analysis of equilibria in multi-objective normal-form games

Published online by Cambridge University Press:  30 June 2020

Roxana Rădulescu
Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels1050, Belgium, e-mails: roxana.radulescu@vub.be, ann.nowe@vub.be
Patrick Mannion
Affiliation:
School of Computer Science, National University of Ireland Galway, GalwayH91 TK33, Ireland, e-mail: patrick.mannion@nuigalway.ie
Yijie Zhang
Affiliation:
Universiteit van Amsterdam, Amsterdam, The Netherlands, e-mail: yijie.zhang@student.uva.nl
Diederik M. Roijers
Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels1050, Belgium, e-mails: roxana.radulescu@vub.be, ann.nowe@vub.be Microsystems Technology, HU University of Applied Sciences Utrecht, Heidelberglaan 15, 3584CSUtrecht, The Netherlands, e-mail: diederik.yamamoto-roijers@hu.nl
Ann Nowé
Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels1050, Belgium, e-mails: roxana.radulescu@vub.be, ann.nowe@vub.be

Abstract

In multi-objective multi-agent systems (MOMASs), agents explicitly consider the possible trade-offs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analyzed on the basis of the utility that these compromises have for the users of a system, where an agent’s utility function maps their payoff vectors to scalar utility values. This utility-based approach naturally leads to two different optimization criteria for agents in a MOMAS: expected scalarized returns (ESRs) and scalarized expected returns (SERs). In this article, we explore the differences between these two criteria using the framework of multi-objective normal-form games (MONFGs). We demonstrate that the choice of optimization criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when nonlinear utility functions are used.

Type
Adaptive and Learning Agents
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This article extends an earlier unpublished paper (Rădulescu et al., 2019) that was originally presented at the Adaptive and Learning Agents Workshop 2019.

References

Arifovic, J., Boitnott, J. F. & Duffy, J. 2016. Learning correlated equilibria: an evolutionary approach. Journal of Economic Behavior & Organization 157, 171–190.Google Scholar
Aumann, R. J. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1(1), 67–96.CrossRefGoogle Scholar
Aumann, R. J. 1987. Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society 1, 118.CrossRefGoogle Scholar
Bergstresser, K. and Yu, P. 1977. Domination structures and multicriteria problems in n-person games. Theory and Decision 8(1), 5–48.CrossRefGoogle Scholar
Blackwell, D.et al. 1956. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics 6(1), 1–8.CrossRefGoogle Scholar
Borm, P., Tijs, S. & van den Aarssen, J. 1990. Pareto equilibria in multi-objective games. Methods of Operations Research 60, 303312.Google Scholar
Colby, M. & Tumer, K. 2015. An evolutionary game theoretic analysis of difference evaluation functions. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, 1391–1398. ACM.CrossRefGoogle Scholar
Devlin, S. & Kudenko, D. 2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 225–232.Google Scholar
Foster, D. P. & Vohra, R. 1999. Regret in the on-line decision problem. Games and Economic Behavior 29 (1–2), 7–35.CrossRefGoogle Scholar
Fudenberg, D. & Kreps, D. M. 1993. Learning mixed equilibria. Games and Economic Behavior 5 (3), 320367. ISSN 0899-8256.CrossRefGoogle Scholar
Hart, S. & Schmeidler, D. 1989. Existence of correlated equilibria. Mathematics of Operations Research 14(1), 1825.CrossRefGoogle Scholar
Igarashi, A. & Roijers, D. M. 2017. Multi-criteria coalition formation games. In International Conference on Algorithmic DecisionTheory, 197–213. Springer.CrossRefGoogle Scholar
Jensen, J. L. W. V.et al. 1906. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Actamathematica 30, 175193.Google Scholar
Lozan, V. & Ungureanu, V. 2013. Computing the pareto-nash equilibrium set in finite multi-objective mixed-strategy games. Computer Science Journal of Moldova, 21 (2).Google Scholar
Lozovanu, D., Solomon, D. & Zelikovsky, A. 2005. Multiobjective games and determining pareto-nashequilibria. Buletinul Academiei de Ştiinţe a Republicii Moldova. Matematica, (3), 115–122.Google Scholar
Mannion, P., Devlin, S., Duggan, J. & Howley, E. 2018. Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. The Knowledge Engineering Review 33, e23.CrossRefGoogle Scholar
Mannion, P., Devlin, S., Mason, K., Duggan, J. & Howley, E. 2017a. Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing 263, 6073.CrossRefGoogle Scholar
Mannion, P., Duggan, J. & Howley, E. 2016a. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems, McCluskey, L. T., Kotsialos, A., Müller, P. J., Klügl, F., Rana, O. & Schumann, R. (eds), 47–66. Springer International Publishing.Google Scholar
Mannion, P., Duggan, J. & Howley, E. 2017b. A theoretical and empirical analysis of reward transformations in multi-objective stochastic games. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2017b.Google Scholar
Mannion, P., Mason, K., Devlin, S., Duggan, J. & Howley, E. 2016b. Multi-objective dynamic dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016b.Google Scholar
Mossalam, H., Assael, Y. M., Roijers, D. M. & Whiteson, S. 2016. Multi-objective deep reinforcement learning. In NIPS Workshop on Deep Reinforcement Learning.Google Scholar
Nash, J. 1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences 36(1), 4849. ISSN 0027-8424.Google Scholar
Nash, J. 1951. Non-cooperative games. Annals of Mathematics 54(2), 286295.CrossRefGoogle Scholar
Papadimitriou, C. H. & Roughgarden, T. 2008. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM) 55(3), 14.CrossRefGoogle Scholar
Rădulescu, R., Legrand, M., Efthymiadis, K., Roijers, D. M. & Nowé, A. 2018. Deep multi-agent reinforcement learning in a homogeneous open population. In Proceedings of the 30th Benelux Conference on Artificial Intelligence (BNAIC 2018), 177–191.Google Scholar
Rădulescu, R., Mannion, P., Roijers, D. & Nowé, A. 2019. Equilibria in multi-objective games: a utility-based perspective. In Adaptive and Learning Agents Workshop (at AAMAS 2019), May 2019.Google Scholar
Rădulescu, R., Mannion, P., Roijers, D. M. and Nowé, A. 2020. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems 34 (10).CrossRefGoogle Scholar
Reymond, M., Patyn, C., Rădulescu, R., Deconinck, G. & Nowé, A. 2018. Reinforcement learning for demand response of domestic household appliances. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.Google Scholar
Roijers, D. M. 2016. Multi-Objective Decision-Theoretic Planning. PhD thesis, University of Amsterdam.CrossRefGoogle Scholar
Roijers, D. M., Steckelmacher, D. & Nowé, A. 2018. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.Google Scholar
Roijers, D. M., Vamplew, P., Whiteson, S. & Dazeley, R. 2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48, 67113.CrossRefGoogle Scholar
Roijers, D. M. & Whiteson, S. 2017. Multi-objective decision making. Synthesis Lectures on Artificial Intelligence and Machine Learning 11(1), 1129.CrossRefGoogle Scholar
Shapley, L. S. & Rigby, F. D. 1959. Equilibrium points in games with vector payoffs. Naval Research Logistics Quarterly 6 (1), 5761.CrossRefGoogle Scholar
Talpert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A. & Perez, P. 2019. Exploring applications of deep reinforcement learning for real-world autonomous driving systems. In International Conference on Computer Vision Theory and Applications (VISAPP), February 2019.Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R. & Dekker, E. 2011. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning 84 (1–2), 5180.CrossRefGoogle Scholar
Van Moffaert, K. & Nowé, A. 2014. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research 15(1), 34833512.Google Scholar
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Jarrod Millman, K., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C., Polat, İ., Feng, Y., Moore, E. W., Vand erPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P. & Contributors, S. 2019. SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. arXiv e-prints, art. arXiv:1907.10121, July 2019.Google Scholar
Voorneveld, M., Vermeulen, D. & Borm, P. 1999. Axiomatizations of paretoequilibria in multicriteria games. Games and Economic Behavior 280 (1), 146–154.Google Scholar
Walraven, E. & Spaan, M. T. J. 2016. Planning under uncertainty for aggregated electric vehicle charging with renewable energy supply. In Proceedings of the European Conference on Artificial Intelligence, 904–912.Google Scholar
Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, 1989.Google Scholar
Wierzbicki, A. P. 1995. Multiple criteria games – theory and applications. Journal of Systems Engineering and Electronics 60 (2), 65–81.Google Scholar
Wiggers, A. J., Oliehoek, F. A. & Roijers, D. M. 2016. Structure in the value function of two-player zero-sum games of incomplete information. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, 1628–1629. IOS Press.Google Scholar
Yliniemi, L., Agogino, A. K. & Tumer, K. 2015. Simulation of the introduction of new technologies in air traffic management. Connection Science 270 (3), 269–287.Google Scholar
Yliniemi, L. & Tumer, K. 2016. Multi-objective multiagent credit assignment in reinforcement learning and nsga-ii. Soft Computing 200 (10), 3869–3887.Google Scholar
Zhang, Y., Rădulescu, R., Mannion, P., Roijers, D. M. & Nowé, A. 2020. Opponent modelling for reinforcement learning in multi-objective normal form games. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), May 2020.Google Scholar
Zinkevich, M., Greenwald, A. & Littman, M. L. 2006. Cyclic equilibria in markov games. In Advances in Neural Information Processing Systems, 16411648.Google Scholar
Zintgraf, L. M., Kanters, T. V., Roijers, D. M., Oliehoek, F. A. & Beau, P. 2015. Quality assessment of MORL algorithms: a utility-based approach. In Benelearn 2015: Proceedings of the Twenty-Fourth Belgian-Dutch Conference on Machine Learning.Google Scholar
Zintgraf, L. M., Roijers, D. M., Linders, S., Jonker, C. M. & Nowé, A. 2018. Ordered preference elicitation strategies for supporting multi-objective decision making. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, 1477–1485. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar