Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-26T04:21:02.417Z Has data issue: false hasContentIssue false

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

Published online by Cambridge University Press:  13 January 2022

D. Xu
Affiliation:
State Key Laboratory for Strength and Vibration of Mechanical Structures, Xi’an Jiaotong University, Xi’an, 710049, China
G. Chen*
Affiliation:
Shaanxi Province Key Laboratory for Service Environment and Control of Advanced Aircraft, Xi’an Jiaotong University, Xi’an, 710049, China

Abstract

In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Xing, D.J., Zhen, Z.Y. and Gong, H.J. Offense-defense confrontation decision making for dynamic UAV swarm versus UAV swarm, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2019, 233, (15), pp 56895702. https://doi.org/10.1177/0954410019853982 CrossRefGoogle Scholar
Zhang, J. and Xing, J.H. Cooperative task assignment of multi-UAV system, Chin. J. Aeronaut., 2020. https://doi.org/10.1016/j.cja.2020.02.009 CrossRefGoogle Scholar
Wang, C., Wu, L.Z., Yan, C., et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork, Chin. J. Aeronaut., 2020. https://doi.org/10.1016/j.cja.2020.05.001 CrossRefGoogle Scholar
Imanberdiyev, N., Fu, C., Kayacan, E., et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning, 14th International Conference on Control, Automation, Robotics and Vision (ICARCV 2016). https://doi.org/10.1109/ICARCV.2016.7838739 CrossRefGoogle Scholar
Wu, Y.H., Yu, Z.C., Li, C.Y., et al. Reinforcement learning in dual-arm trajectory planning for a free-floating space robot, Aerosp. Sci. Technol., 2020, 98. https://doi.org/10.1016/j.ast.2019.105657 CrossRefGoogle Scholar
Dong, Y.Q., Ai, J.L. and Liu, J.Q. Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2019, 233, (16), pp 59435991. https://doi.org/10.1177/0954410019889447 CrossRefGoogle Scholar
Sun, Z., Chao, T., Wang, S., et al. Ascent trajectory tracking method using time-varying quadratic adaptive dynamic programming, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2018, 233, (11), pp 41544165. https://doi.org/10.1177/0954410018817613 CrossRefGoogle Scholar
Xu, G.T., Long, T., Wang, Z., et al. Target-bundled genetic algorithm for multi-unmanned aerial vehicle cooperative task assignment considering precedence constraints, Proc. Inst. Mech. Eng. G J. Aerosp. Eng. 2019, 234, (3), pp 760773. https://doi.org/10.1177/0954410019883106 CrossRefGoogle Scholar
Zhao, E.J., Chao, T., Wang, S.Y., et al. Multiple flight vehicles cooperative guidance law based on extended state observer and finite time consensus theory, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2016, 232, (2), pp 270279. https://doi.org/10.1177/0954410016683734 CrossRefGoogle Scholar
Lowe, R., Wu, Y., Tamar, A., et al. Multi-agent Actor-Critic for mixed cooperative-competitive environments. arXiv:1706.02275v3, 2018.Google Scholar
Liu, Y.X., Liu, H., Tian, Y.L., et al. Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area, Aerosp. Sci. Technol., 2020, 98, p 105671. https://doi.org/10.1016/j.ast.2019.105671 CrossRefGoogle Scholar
Zhen, Z.Y., Xing, D.J. and Gao, C. Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm, Aerosp. Sci. Technol., 2018, 76, pp 402411. https://doi.org/10.1016/j.ast.2018.01.035 CrossRefGoogle Scholar
Yao, P., Wang, H.L. and Su, Z.K. Cooperative path planning with applications to target tracking and obstacle avoidance for multi-UAVs, Aerosp. Sci. Technol., 2016, 54, pp 1022. https://doi.org/10.1016/j.ast.2016.04.002 CrossRefGoogle Scholar
Wang, C., Li, J., Jing, N., et al. A distributed cooperative dynamic task planning algorithm for multiple satellites based on multi-agent hybrid learning, Chin. J. Aeronaut. 2011, 24, (4), pp 493505. https://doi.org/10.1016/S1000-9361(11)60057-5 CrossRefGoogle Scholar
Sun, G.B., Zhou, R., Xu, K., et al. Cooperative formation control of multiple aerial vehicles based on guidance route in a complex task environment, Chin. J. Aeronaut. 2020, 33, (2), pp 701720. https://doi.org/10.1016/j.cja.2019.08.009 CrossRefGoogle Scholar
Fu, X.W., Pan, J., Wang, H.X., et al. A formation maintenance and reconstruction method of UAV swarm based on distributed control, Aerosp. Sci. Technol., 2020, 104, p. 105981. https://doi.org/10.1016/j.ast.2020.105981 CrossRefGoogle Scholar
Fu, X.W., Pan, J., Wang, H.X., et al. A formation maintenance and reconstruction method of UAV swarm based on distributed control with obstacle avoidance, Australian and New Zealand Control Conference (ANZCC), 2019. https://doi.org/10.1109/ANZCC47194.2019.8945601 CrossRefGoogle Scholar
La, H.M., Nguyen, T., Le, T.D., et al. Formation control and obstacle avoidance of multiple rectangular agents with limited communication ranges, IEEE Trans. Control Network Syst., 2017, 4, (4), pp 680691. https://doi.org/10.1109/TCNS.2016.2542978 Google Scholar
La, H.M. and Sheng, W. Dynamic target tracking and observing in a mobile sensor network, Robot. Autonom. Syst., 2012, 60, (7), pp 9961009. https://doi.org/10.1016/j.robot.2012.03.006 CrossRefGoogle Scholar
Degas, A., Rantrua, A., Kaddoum, E., et al. Dynamic collision avoidance using local cooperative airplanes decisions, CEAS Aeronaut. J., 2020, 11, pp 309320. https://doi.org/10.1007/s13272-019-00400-6 CrossRefGoogle Scholar
Busoniu, L., Babuska, R. and Schutter, B.D. Multi-agent reinforcement learning: An overview, Srinivasan, D., & Jain, L.C. (eds). Innovations in Multi-Agent Systems and Applications – 1, vol. 310. Studies in Computational Intelligence, Springer, Berlin, Heidelberg, 2010, pp 183221. https://doi.org/10.1007/978-3-642-14435-6_7 CrossRefGoogle Scholar
Musavi, N., Onural, D., Gunes, K., et al. Unmanned aircraft systems airspace integration: a game theoretical framework for concept evaluations, J. Guid. Control Dyn. 2017, 40, (1), pp 96109. https://doi.org/10.2514/1.G000426 CrossRefGoogle Scholar
Petar, K., Sylvain, C. and Darwin, C. Reinforcement learning in robotics: applications and real-world challenges, Robotics, 2013, 2, (3), pp 122148. https://doi.org/10.3390/robotics2030122 Google Scholar
Das-Stuart, A., Howell, K.C., and Folta, D. Rapid trajectory design in complex environments enabled by reinforcement learning and graph search strategies, Acta Astronaut., 2019, 171, pp 172195. https://doi.org/10.1016/j.actaastro.2019.04.037 CrossRefGoogle Scholar
Jiang, J.X., Zeng, X.Y., Guzzetti, D., et al. Path planning for asteroid hopping rovers with pre-trained deep reinforcement learning architectures, Acta Astronaut., 2020, 171, pp 265279. https://doi.org/10.1016/j.actaastro.2020.03.007 CrossRefGoogle Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al. Human-level control through deep reinforcement learning, Nature, 518, 2015, pp 529533. https://doi.org/10.1038/nature14236.CrossRefGoogle ScholarPubMed
Wang, Z.Y., Freitas, N.D. and Lanctot, M. Dueling network architectures for deep reinforcement learning, Proceedings of the International Conference on Machine Learning, New York, USA, April 2016: 1995-2003. arXiv: 1511.06581v3.Google Scholar
Hausknecht, M. and Stone, P. Deep recurrent q-learning for partially observable MDPs, Association for the Advancement of Artificial Intelligence (AAAI 2015), 2017. arXiv: 1507.06527v4.Google Scholar
Yang, X.X. and Wei, P. UAV navigation in high dynamic environments: A deep reinforcement learning approach, Chin. J. Aeronaut. 2020. https://doi.org/10.1016/j.cja.2020.05.011 CrossRefGoogle Scholar
Silver, D., Lever, G., Heess, N., et al. Deterministic policy gradient algorithms, Proceedings of the International Conference on Machine Learning, vol. 32, 2014, pp 387395.Google Scholar
Duryea, E., Ganger, M. and Hu, W. Exploring deep reinforcement learning with multi q-learning, Intell. Cont. Automat., 2016, 7, (4) pp 129144. https://doi.org/10.4236/ica.2016.74012 CrossRefGoogle Scholar
Littman, M.L. Markov games as a framework for multi-agent reinforcement learning, Proceedings of the 11th International Conference on Machine Learning (ICML 1994), Rutgers University, New Brunswick, NJ, July 1994, pp 157–163. https://doi.org/10.1016/B978-1-55860-335-6.50027-1 CrossRefGoogle Scholar
Gong, L.G., Wang, Q., Hu, C.H., et al. Switching control of morphing aircraft based on Q-learning, Chin. J. Aeronaut., 2020, 33, (2), pp 672687. https://doi.org/10.1016/j.cja.2019.10.005 CrossRefGoogle Scholar
Peters, J. and Schaal, S. Policy gradient methods for robotics, International Conference on Intelligent Robots and Systems, 2007. https://doi.org/10.1109/IROS.2006.282564 CrossRefGoogle Scholar
Babuska, R., Busoniu, L., and Schutter, B.D. Reinforcement learning for multi-agent systems, Proceedings of the11th International Conference on Emerging Technologies and Factory Automation (ETFA 2006), IEEE, Prague, Czech Republic, 2006. http://www.dcsc.tudelft.nl Google Scholar
Nguyen, T.T., Nguyen, N.D., and Nahavandi, S. Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications, 2019. arXiv: 1812.11794v2.Google Scholar
Li, C.G., Wang, M. and Yuan, Q.N. A mulit-agent reinforcement learning using actor-critic methods, Proceedings of the 7th International Conference on Machine Learning and Cybernetics 2008. https://doi.org/10.1109/ICMLC.2008.4620528 CrossRefGoogle Scholar
Gupta, J.K., Egorov, M. and Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Sukthankar, G. and Rodriguez-Aguilar, J. (eds.) International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Lecture Notes in Computer Science, 106(42), Springer, Cham, 2017, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5 CrossRefGoogle Scholar
Guo, H.L. and Meng, Y. Distributed reinforcement learning for coordinate multi-robot foraging, J. Intell. Robot Syst., 2010, 60, pp 531551. https://doi.org/10.1007/s10846-010-9429-4 CrossRefGoogle Scholar
Lowe, R., Wu, Y., Tamar, A., et al. Multi-agent actor-critic for mixed cooperative-competitive environments, Proceedings of the Neural Information Processing Systems (NIPS 2017). arXiv:1706.02275v3.Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al. Continuous control with deep reinforcement learning, International Conference on Learning Representations, 2015, pp 114. https://doi.org/10.1016/S1098-3015(10)67722-4 Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., et al. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, 2017. arXiv: 1708.02596v2.CrossRefGoogle Scholar
Yang, Z., Merrick, K., Abbass, H., et al. Multi-task deep reinforcement learning for continuous action control, Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp 3301–3307. https://doi.org/10.24963/ijcai.2017/461 CrossRefGoogle Scholar
Baker, B., Gupta, O., Naik, N., et al. Designing neural network architectures using reinforcement learning, International Conference on Learning Representations, 2017. arXiv: 1611.02167v2Google Scholar
Liu, Q.H., Liu, X.F. and Cai, G.P. Control with distributed deep reinforcement learning: Learn a better policy, 2018. arXiv: 1811.10264v2.Google Scholar
Goecks, V.G., Leal, P.B., White, T., et al. Control of morphing wing shapes with deep reinforcement learning, 2018 AIAA Information Systems-AIAA Infotech @ Aerospace, Kissimmee, Florida, January 2018. https://doi.org/10.2514/6.2018-2139 Google Scholar
Wen, N., Liu, Z.H., Zhu, L.P., et al. Deep reinforcement learning and its application on autonomous shape optimization for morphing aircrafts, J. Astronaut., 2017, 38, pp 11531159. https://doi.org/10.3873/j.issn.1000-1328.2017.11.003 Google Scholar
Xu, D., Hui, Z., Liu, Y.Q., et al. Morphing control of a new bionic morphing UAV with deep reinforcement learning, Aerosp. Sci. Technol., 2019, 92, pp 232243. https://doi.org/10.1016/j.ast.2019.05.058 CrossRefGoogle Scholar
La, H.M. Multi-robot swarm for cooperative scalar field mapping, Handbook of Research on Design, Control, and Modeling of Swarm Robotics, 2015. https://doi.org/10.4018/978-1-4666-9572-6.ch014 CrossRefGoogle Scholar
La, H.M., Sheng, W. and Chen, J. Cooperative and active sensing in mobile sensor networks for scalar field mapping, IEEE Trans. Syst. Man Cybern. Syst., 2015, 45(1), pp 112. https://doi.org/10.1109/TSMC.2014.2318282 CrossRefGoogle Scholar
Adepegba, A.A., Miah, S. and Spinello, D. Multi-agent area coverage control using reinforcement learning, Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, 2016, pp 368–373. http://dx.doi.org/10.20381/ruor-5715 CrossRefGoogle Scholar
Pham, H.X., La, H.M., Feil-Seifer, D., et al. Cooperative and distributed reinforcement learning of drones for field coverage, 2018. arXiv: 1803.07250v1.Google Scholar
Supplementary material: File

Xu and Chen supplementary material

Xu and Chen supplementary material

Download Xu and Chen supplementary material(File)
File 3.7 MB