Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-10T23:42:35.620Z Has data issue: false hasContentIssue false

Evaluating the learning and performance characteristics of self-organizing systems with different task features

Published online by Cambridge University Press:  27 December 2021

Hao Ji
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE 400, Los Angeles, CA90089-1453, USA
Yan Jin*
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE 400, Los Angeles, CA90089-1453, USA
*
Author for correspondence: Yan Jin, E-mail: yjin@usc.edu

Abstract

Self-organizing systems (SOS) are developed to perform complex tasks in unforeseen situations with adaptability. Predefining rules for self-organizing agents can be challenging, especially in tasks with high complexity and changing environments. Our previous work has introduced a multiagent reinforcement learning (RL) model as a design approach to solving the rule generation problem of SOS. A deep multiagent RL algorithm was devised to train agents to acquire the task and self-organizing knowledge. However, the simulation was based on one specific task environment. Sensitivity of SOS to reward functions and systematic evaluation of SOS designed with multiagent RL remain an issue. In this paper, we introduced a rotation reward function to regulate agent behaviors during training and tested different weights of such reward on SOS performance in two case studies: box-pushing and T-shape assembly. Additionally, we proposed three metrics to evaluate the SOS: learning stability, quality of learned knowledge, and scalability. Results show that depending on the type of tasks; designers may choose appropriate weights of rotation reward to obtain the full potential of agents’ learning capability. Good learning stability and quality of knowledge can be achieved with an optimal range of team sizes. Scaling up to larger team sizes has better performance than scaling downwards.

Type
Research Article
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abramson, J, Ahuja, A, Barr, I, Brussee, A, Carnevale, F, Cassin, M, Chhaparia, R, Clark, S, Damoc, B, Dudzik, A, Georgiev, P, Guy, A, Harley, T, Hill, F, Hung, A, Kenton, Z, Landon, J, Lillicrap, T, Mathewson, K, Mokrá, S, Muldal, A, Santoro, A, Savinov, N, Varma, V, Wayne, G, Williams, D, Wong, N, Yan, C and Zhu, R (2020) Imitating interactive intelligence. arXiv preprint arXiv:2012.05672.Google Scholar
Arroyo, M, Huisman, N and Jensen, DC (2018) Exploring natural strategies for bio-inspired fault adaptive systems design. Journal of Mechanical Design 140, 091101-1091101-11.CrossRefGoogle Scholar
Ashby, WR (1961) An Introduction to Cybernetics. London, UK: Chapman & Hall Ltd.Google Scholar
Ashby, WR (1991) Requisite variety and its implications for the control of complex systems. In Klir, CJ (ed.), Facets of Systems Science. Boston, MA: Springer, pp. 405417.CrossRefGoogle Scholar
Bar-Yam, Y (2002) General features of complex systems. In Kiel, LD (ed.), Encyclopedia of Life Support Systems (EOLSS). Oxford, UK: UNESCO, EOLSS Publishers.Google Scholar
Beckers, R, Holland, OE and Deneubourg, JL (2000) From local actions to global tasks: stigmergy and collective robotics. In Cruse, H, Dean, J and Ritter, H (eds), Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3. Dordrecht: Springer, pp. 10081022CrossRefGoogle Scholar
Busoniu, L, Babuska, R and De Schutter, B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 156172.CrossRefGoogle Scholar
Chen, C and Jin, Y (2011) A behavior based approach to cellular self-organizing systems design. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 54860, pp. 95–107.CrossRefGoogle Scholar
Chiang, W and Jin, Y (2012) Design of cellular self-organizing systems. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 45028. American Society of Mechanical Engineers, pp. 511–521.CrossRefGoogle Scholar
Chung, J, Gulcehre, C, Cho, K and Bengio, Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.Google Scholar
Collinot, A and Drogoul, A (1998) Using the Cassiopeia method to design a robot soccer team. Applied Artificial Intelligence 12, 127147.CrossRefGoogle Scholar
Dasgupta, P (2008) A multiagent swarming system for distributed automatic target recognition using unmanned aerial vehicles. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38, 549563.CrossRefGoogle Scholar
Drogoul, A and Zucker, JD (1998) Methodological Issues for Designing Multiagent Systems with Machine Learning Techniques: Capitalizing Experiences from the Robocup Challenge (Doctoral dissertation, LIP6).Google Scholar
Ferguson, SM and Lewis, K (2006) Effective development of reconfigurable systems using linear state-feedback control. AIAA Journal 44, 868878.CrossRefGoogle Scholar
Foerster, J, Nardelli, N, Farquhar, G, Afouras, T, Torr, PH, Kohli, P and Whiteson, S (2017) Stabilising experience replay for deep multiagent reinforcement learning. International Conference on Machine Learning. PMLR, pp. 1146–1155.Google Scholar
Foerster, J, Farquhar, G, Afouras, T, Nardelli, N and Whiteson, S (2018) Counterfactual multiagent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, No. 1.CrossRefGoogle Scholar
Groß, R, Bonani, M, Mondada, F and Dorigo, M (2006) Autonomous self-assembly in swarm-bots. IEEE Transactions on Robotics 22, 11151130.Google Scholar
Hausknecht, M and Stone, P (2015) Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv:1507.06527.Google Scholar
Hochreiter, S and Schmidhuber, J (1997) Long short-term memory. Neural Computation 9, 17351780.CrossRefGoogle ScholarPubMed
Humann, J, Khani, N and Jin, Y (2014) Evolutionary computational synthesis of self-organizing systems. AI EDAM 28, 259275.Google Scholar
Humann, J, Khani, N and Jin, Y (2016) Adaptability tradeoffs in the design of self-organizing systems. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 50190. American Society of Mechanical Engineers, p. V007T06A016.CrossRefGoogle Scholar
Ji, H and Jin, Y (2018) Modeling trust in self-organizing systems with heterogeneity. ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers Digital Collection.CrossRefGoogle Scholar
Ji, H and Jin, Y (2019) Designing self-organizing systems with deep multi-agent reinforcement learning. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 59278. American Society of Mechanical Engineers, p. V007T06A019.CrossRefGoogle Scholar
Ji, H and Jin, Y (2020) Designing self-assembly systems with deep multiagent reinforcement learning. Design Computing and Cognition’14. Springer, Cham, pp. xx–xx.Google Scholar
Jones, C and Mataric, MJ (2003) Adaptive division of labor in large-scale minimalist multi-robot systems. Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol. 2. IEEE, pp. 1969–1974.CrossRefGoogle Scholar
Kennedy, J (2006) Swarm intelligence. In Zomaya, AY (ed.), Handbook of Nature-Inspired and Innovative Computing. Boston, MA: Springer, pp. 187219.CrossRefGoogle Scholar
Khani, N and Jin, Y (2015) Dynamic structuring in cellular self-organizing systems. In Gero, JS (ed.), Design Computing and Cognition’14. Cham: Springer, pp. 320.Google Scholar
Khani, N, Humann, J and Jin, Y (2016) Effect of social structuring in self-organizing systems. Journal of Mechanical Design 138, 041101-1041101-11.CrossRefGoogle Scholar
Königseder, C and Shea, K (2016) Comparing strategies for topologic and parametric rule application in automated computational design synthesis. Journal of Mechanical Design 138, 011102-1011102-12.CrossRefGoogle Scholar
Lamont, GB, Slear, JN and Melendez, K (2007) UAV swarm mission planning and routing using multi-objective evolutionary algorithms. 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making. IEEE, pp. 10–20.CrossRefGoogle Scholar
LaValle, SM (2006) Planning Algorithms. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Liu, X and Jin, Y (2018) Design of transfer reinforcement learning mechanisms for autonomous collision avoidance. International Conference on-Design Computing and Cognition. Cham: Springer, pp. 303–319.Google Scholar
Lowe, R, Wu, Y, Tamar, A, Harb, J, Abbeel, P and Mordatch, I (2017) Multiagent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275.Google Scholar
Martin, MV and Ishii, K (1997.Design for variety: development of complexity indices and design charts. Proceedings of ASME 1997 Design Engineering Technical Conferences, September 14–17, 1997, Sacramento, CA, DFM-4359-1DFM-4359-9.Google Scholar
McComb, C, Cagan, J and Kotovsky, K (2017) Optimizing design teams based on problem properties: computational team simulations and an applied empirical test. Journal of Mechanical Design 139, 041101-1041101-12.CrossRefGoogle Scholar
Meluso, J and Austin-Breneman, J (2018) Gaming the system: an agent-based model of estimation strategies and their effects on system performance. Journal of Mechanical Design 140, 121101-1121101-9.CrossRefGoogle Scholar
Min, G, Suh, ES and Hölttä-Otto, K (2016) System architecture, level of decomposition, and structural complexity: analysis and observations. Journal of Mechanical Design 138, 021102-1021102-11.CrossRefGoogle Scholar
Mnih, V, Kavukcuoglu, K, Silver, D, Graves, A, Antonoglou, I, Wierstra, D and Riedmiller, M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.Google Scholar
Mnih, V, Kavukcuoglu, K, Silver, D, Rusu, AA, Veness, J, Bellemare, MG, Graves, A, Riedmiller, M, Fidjeland, AK, Ostrovski, G, Petersen, S, Beattie, C, Sadik, A, Antonoglou, I, King, H, Kumaran, D, Wierstra, D, Legg, S and Hassabis, D (2015) Human-level control through deep reinforcement learning. Nature 518, 529533.CrossRefGoogle ScholarPubMed
Peng, XB, Berseth, G, Yin, K and Van De Panne, M (2017) Deeploco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 113.Google Scholar
Pippin, CE (2013) Trust and Reputation for Formation and Evolution of Multi-robot Teams (Doctoral dissertation). Georgia Institute of Technology.Google Scholar
Pippin, C and Christensen, H (2014) Trust modeling in multi-robot patrolling. 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 59–66.CrossRefGoogle Scholar
Price, IC and Lamont, GB (2006) GA directed self-organized search and attack UAV swarms. Proceedings of the 2006 Winter Simulation Conference. IEEE, pp. 1307–1315.Google Scholar
Rahimi, M, Gibb, S, Shen, Y and La, HM (2018) A comparison of various approaches to reinforcement learning algorithms for multi-robot box pushing. International Conference on Engineering Research and Applications. Cham: Springer, pp. 16–30.Google Scholar
Rashid, T, Samvelyan, M, Schroeder, C, Farquhar, G, Foerster, J and Whiteson, S (2018) Qmix: monotonic value function factorisation for deep multiagent reinforcement learning. International Conference on Machine Learning. PMLR, pp. 4295–4304.Google Scholar
Reynolds, CW (1987) Flocks, herds and schools: a distributed behavioral model. Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques. pp. 25–34.Google Scholar
Ruini, F and Cangelosi, A (2009) Extending the evolutionary robotics approach to flying machines: an application to MAV teams. Neural Networks 22, 812821.CrossRefGoogle ScholarPubMed
Sutton, RS and Barto, AG (2018) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
Tampuu, A, Matiisen, T, Kodelja, D, Kuzovkin, I, Korjus, K, Aru, J, Aru, J and Vicente, R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS One 12, e0172395.CrossRefGoogle ScholarPubMed
Tan, M (1993) Multiagent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning. pp. 330–337.Google Scholar
Wang, Y and De Silva, CW (2006) Multi-robot box-pushing: single-agent q-learning vs. team q-learning.2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 3694–3699.Google Scholar
Wang, Z, Schaul, T, Hessel, M, Hasselt, H, Lanctot, M and Freitas, N (2016) Dueling network architectures for deep reinforcement learning. International Conference on Machine Learning. PMLR, pp. 1995–2003.Google Scholar
Watkins, CJCH (1989) Learning from delayed rewards.Google Scholar
Wei, Y, Madey, GR and Blake, MB (2013) Agent-based simulation for uav swarm mission planning and execution. Proceedings of the Agent-Directed Simulation Symposium, pp. 1–8.Google Scholar
Werfel, J (2012) Collective construction with robot swarms. In Doursat, R, Sayama, H and Michel, O (eds), Morphogenetic Engineering. Berlin, Heidelberg: Springer, pp. 115140.CrossRefGoogle Scholar