Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-10T09:20:01.665Z Has data issue: false hasContentIssue false

Four-Dimensional Trajectory Generation for UAVs Based on Multi-Agent Q Learning

Published online by Cambridge University Press:  12 February 2020

Wenjie Zhao
Affiliation:
(School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, Zhejiang Province, China)
Zhou Fang
Affiliation:
(School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, Zhejiang Province, China)
Zuqiang Yang*
Affiliation:
(Information Science Academy of China Electronics Technology Group Corporation, Beijing, China)
*
(E-mail: gaayzq@126.com)

Abstract

A distributed four-dimensional (4D) trajectory generation method based on multi-agent Q learning is presented for multiple unmanned aerial vehicles (UAVs). Based on this method, each vehicle can intelligently generate collision-free 4D trajectories for time-constrained cooperative flight tasks. For a single UAV, the 4D trajectory is generated by the bionic improved tau gravity guidance strategy, which can synchronously guide the position and velocity to the desired values at the arrival time. Furthermore, to optimise trajectory parameters, the continuous state and action wire fitting neural network Q (WFNNQ) learning method is applied. For multi-UAV applications, the learning is organised by the win or learn fast-policy hill climbing (WoLF-PHC) algorithm. Dynamic simulation results show that the proposed method can efficiently provide 4D trajectories for the multi-UAV system in challenging simultaneous arrival tasks, and the fully trained method can be used in similar trajectory generation scenarios.

Type
Research Article
Copyright
Copyright © The Royal Institute of Navigation 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Alejo, D., Cobano, J., Heredia, G. and Ollero, A. (2013). Particle Swarm Optimization for Collision-Free 4D Trajectory Planning in Unmanned Aerial Vehicles. Proceedings of the 2013 International Conference on Unmanned Aircraft Systems, Atlanta, USA, 298–307.CrossRefGoogle Scholar
Dong, X. W., Li, Y. F., Lu, C., Hu, G. Q., Li, Q. D. and Ren, Z. (2018). Time-varying formation tracking for UAV swarm systems with switching directed topologies. IEEE Transactions on Neural Networks and Learning Systems, 30(12), 36743685.10.1109/TNNLS.2018.2873063CrossRefGoogle ScholarPubMed
Gaskett, C., Wettergreen, D. and Zelinsky, A. (1999). Reinforcement Learning Applied to the Control of an Autonomous Underwater Vehicle. Proceedings of the Australian Conference on Robotics and Automation, Brisbane, Australia, March 1999.Google Scholar
Hung, S. M. and Givigi, S. N. (2017). A Q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47, 186197.CrossRefGoogle Scholar
Kendoul, F. (2014). Four-dimensional guidance and control of movement using time-to-contact: application to automated docking and landing of unmanned rotorcraft systems. The International Journal of Robotics Research, 33, 237267.CrossRefGoogle Scholar
Lee, D. N. (2009). General Tau Theory: evolution to date. Perception, 38(6), 837858.CrossRefGoogle ScholarPubMed
Liu, Y. and Nejat, G. (2016). Multirobot cooperative learning for semiautonomous control in urban search and rescue applications. Journal of Field Robotics, 33(4), 512536.CrossRefGoogle Scholar
Schogler, B., Pepping, G. J. and Lee, D. N. (2008). Tau G-guidance of transients in expressive musical performance. Experimental Brain Research, 189(3), 361372.CrossRefGoogle ScholarPubMed
Tao, J. Y. and Li, D. S. (2006). Cooperative Strategy Learning in Multi-Agent Environment with Continuous State Space. 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 21072111.CrossRefGoogle Scholar
Tian, B. L., Liu, L. H., Lu, H. C. and Zuo, Z. Y. (2018). Multivariable finite time attitude control for quadrotor UAV: theory and experimentation. IEEE Transections on Industrial Electronics, 65(3), 25672577.10.1109/TIE.2017.2739700CrossRefGoogle Scholar
Wang, Y., Wang, S., Tan, M. and Yu, J. (2017). Simultaneous arrival planning for multiple unmanned vehicles formation reconfiguration. International Journal of Robotics and Automation, 32(4), 360368.CrossRefGoogle Scholar
Xi, L., Yu, T., Yang, B. and Zhang, X. S. (2015). A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Conversion and Management, 103, 8293.CrossRefGoogle Scholar
Yang, Z., Fang, Z. and Li, P. (2016). Decentralized 4D trajectory generation for UAVs based on improved intrinsic tau guidance strategy. International Journal of Advanced Robotic Systems, 13(3), 88.CrossRefGoogle Scholar
Yu, T., Zhang, X. S., Zhou, B. and Chan, K. W. (2016). Hierarchical correlated Q-learning for multi-layer optimal generation command dispatch. International Journal of Electrical Power & Energy Systems, 78, 112.CrossRefGoogle Scholar
Zhang, B., Mao, Z., Liu, W. and Liu, J. (2015). Geometric reinforcement learning for path planning of UAVs. Journal of Intelligent & Robotic Systems, 77(2), 391409.CrossRefGoogle Scholar