A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

C.Y. Bao; X. Zhou; P. Wang; R.Z. He; G.J. Tang

doi:10.1017/aer.2023.4

A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

Published online by Cambridge University Press: 08 February 2023

C.Y. Bao

X. Zhou ,

P. Wang ,

R.Z. He and

G.J. Tang

Show author details

C.Y. Bao: Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
X. Zhou: Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
P. Wang*: Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
R.Z. He: Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
G.J. Tang: Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
*: *Corresponding author. Email: wangpeng_nudt@qq.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.

Keywords

Deep reinforcement learning Hypersonic vehicle Gliding phase 3D trajectory generation Deep neural network

Type: Research Article
Information: The Aeronautical Journal , Volume 127 , Issue 1315 , September 2023 , pp. 1638 - 1658

DOI: https://doi.org/10.1017/aer.2023.4 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bao, C., Wang, P. and Tang, G. Integrated guidance and control for hypersonic morphing missile based on variable span auxiliary control, Int J Aerosp Eng, 2019, p 6413410. https://doi.org/10.1155/2019/6413410 Google Scholar

Bao, C., Wang, P. and Tang, G. Integrated method of guidance, control and morphing for hypersonic morphing vehicle in glide phase, Chin J Aeronaut, 2021, 34, (5), pp 535–553. https://doi.org/10.1016/j.cja.2020.11.009 CrossRef Google Scholar

Zhang, W.J. and Wang, B.M. Predictor corrector algorithms considering multiple constraints for entry vehicles, Aeronaut J, 2022, pp 1–23. https://doi.org/10.1017/aer.2022.19 Google Scholar

He, R., Liu, L., Tang, G. and Bao, W.M. Rapid generation of entry trajectory with multiple nofly zone constraints, Adv Space Res, 2017, 60, (7), pp 1430–1442. https://doi.org/10.1016/j.asr.2017.06.046 CrossRef Google Scholar

Wei, C., Han, Y., Pu, J., Li, Y. and Huang, P. Rapid multilayer method on solving optimal endoatmospheric trajectory of launch vehicles, Aeronaut J, 2019, 123, (1267), pp 1396–1414. https://doi.org/10.1017/aer.2019.17 CrossRef Google Scholar

Dancila, R.I. and Botez, R.M. New flight trajectory optimisation method using genetic algorithms, Aeronaut J, 2021, 125, (1286), pp 618–671. https://doi.org/10.1017/aer.2020.138 CrossRef Google Scholar

Chai, R., Tsourdos, A., Savvaris, A., Chai, S. and Xia, Y. Highfidelity trajectory optimization for aeroassisted vehicles using variable order pseudospectral method, Chin J Aeronaut, 2021, 34, (1), pp 237–251. https://doi.org/10.1016/j.cja.2020.07.032 CrossRef Google Scholar

Rizvi, S.T.I., Linshu, H., Dajun, X. and Shah, S.I.A. Trajectory optimisation for a rocketassisted hypersonic boostglide vehicle, Aeronaut J, 2017, 121, (1238), pp 469–487. https://doi.org/10.1017/aer.2017.11 CrossRef Google Scholar

Kwon, D., Jung, Y., Cheon, Y.J. and Bang, H. Sequential convex programming approach for realtime guidance during the powered descent phase of mars landing missions, Adv Space Res, 2021, 68, (11), pp 4398–4417. https://doi.org/10.1016/j.asr.2021.08.033 CrossRef Google Scholar

Sagliano, M., Mooij, E. and Theil, S. Onboard trajectory generation for entry vehicles via adaptive multivariate pseudospectral interpolation, AIAA Guidance, Navigation, and Control Conference, San Diego, California, USA, 2016, https://doi.org/10.2514/6.20162115 CrossRef Google Scholar

Sagliano, M., Heidecker, A., Macés Hernández, J., Farì, S., Schlotterer, M., Woicke, S., Seelbinder, D. and Dumont, E. Onboard guidance for reusable rockets: aerodynamic descent and powered landing, AIAA Scitech 2021 Forum, 2021, VIRTUAL EVENT. https://doi.org/10.2514/6.20210862 CrossRef Google Scholar

Shirobokov, M., Trofimov, S. and Ovchinnikov, M. Survey of machine learning techniques in spacecraft control design, Acta Astronaut, 2021 186, pp 87–97. https://doi.org/10.1016/j.actaastro.2021.05.018 Google Scholar

Schmidhuber, J. Deep learning in neural networks: An overview, Neural Netw, 2015, 61, pp 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 CrossRef Google Scholar PubMed

Basturk, O. and Cetek, C. Prediction of aircraft estimated time of arrival using machine learning methods, Aeronaut J, 2021, 125, (1289), pp 1245–1259. https://doi.org/10.1017/aer.2021.13 CrossRef Google Scholar

Nie, W., Li, H. and Zhang, R. Modelfree adaptive optimal design for trajectory tracking control of rocketpowered vehicle, Chin J Aeronaut, 2020, 33, (6), pp 1703–1716. https://doi.org/10.1016/j.cja.2020.02.022 CrossRef Google Scholar

Shi, Y. and Wang, Z. Onboard generation of optimal trajectories for hypersonic vehicles using deep learning, J. Spacecr Rockets, 2021, 58, (2), pp 400–414. https://doi.org/10.2514/1.A34670 CrossRef Google Scholar

Sánchez, C. and Izzo, D. Realtime optimal control via deep neural networks: study on landing problems, J Guid Control Dyn, 2018, 41, (5), pp 1122–1135. https://doi.org/10.2514/1.G002357 CrossRef Google Scholar

Cheng, L., Wang, Z., Jiang, F. and Li, J. Fast generation of optimal asteroid landing trajectories using deep neural network, IEEE Trans Aerosp Electron Syst, 2020, 56, (4), pp 2642–2655. https://doi.org/10.1109/TAES.2019.2952700 CrossRef Google Scholar

Tenenbaum, J.B., Kemp, C., Griffiths, T.L. and Goodman, N.D. How to grow a mind: statistics, structure, and abstraction, Science, 2011, 331, (6022), pp 1279–1285. https://doi.org/10.1126/science.1192788 CrossRef Google Scholar

Tsitsiklis, J.N. Asynchronous stochastic approximation and Q-learning, Mach Learn, 1994, 16, (3), pp 185–202. https://doi.org/10.1007/BF00993306 CrossRef Google Scholar

Han, X., Zheng, Z., Liu, L., Wang, B., Cheng, Z., Fan, H. and Wang, Y. Online policy iteration ADP-based attitude tracking control for hypersonic vehicles, Aerosp Sci Technol, 2020, 106, p 106233. https://doi.org/10.1016/j.ast.2020.106233 CrossRef Google Scholar

Shi, Z., Zhao, F., Wang, X. and Jin, Z. Satellite attitude tracking control of moving targets combining deep reinforcement learning and predefinedtime stability considering energy optimization, Adv Space Res, 2022, 69, (5), pp 2182–2196. https://doi.org/10.1016/j.asr.2021.12.014 CrossRef Google Scholar

Gaudet, B., Linares, R. and Furfaro, R. Adaptive guidance and integrated navigation with reinforcement meta-learning, Acta Astronaut, 2020, 169, pp 180–190. https://doi.org/10.1016/j.actaastro.2020.01.007 CrossRef Google Scholar

Gaudet, B., Linares, R. and Furfaro, R. Deep reinforcement learning for six degree-of-freedom planetary landing, Adv Space Res, 2020, 65, (7), pp 1723–1741. https://doi.org/10.1016/j.asr.2019.12.030 CrossRef Google Scholar

Gaudet, B., Linares, R. and Furfaro, R. Terminal adaptive guidance via reinforcement metalearning: Applications to autonomous asteroid closeproximity operations, Acta Astronaut, 2020, 171, pp 1–13. https://doi.org/10.1016/j.actaastro.2020.02.036 CrossRef Google Scholar

Zavoli, A. and Federici, L. Reinforcement learning for robust trajectory design of interplanetary missions, J Guid Control Dyn, 2021, 44, (8), pp 1440–1453. https://doi.org/10.2514/1.G005794 CrossRef Google Scholar

Zhao, Y., Yang, H. and Li, S. Real-time trajectory optimization for collision-free asteroid landing based on deep neural networks, Adv Space Res, 2022, 70, (1), pp 112–124. https://doi.org/10.1016/j.asr.2022.04.006 CrossRef Google Scholar

LaFarge, N.B., Miller, D., Howell, K.C. and Linares, R. Autonomous closed-loop guidance using reinforcement learning in a low-thrust, multibody dynamical environment, Acta Astronaut, 2021, 186, pp 1–23. https://doi.org/10.1016/j.actaastro.2021.05.014 CrossRef Google Scholar

Xu, D. and Chen, G. Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning, Aeronaut J, 2022, 126, (1300), pp 932–951. https://doi.org/10.1017/aer.2021.112 CrossRef Google Scholar

Zhou, Z.G., Zhou, D., Chen, X. and Shi, X.N. Adaptive actor-critic learning-based robust appointed-time attitude tracking control for uncertain rigid spacecrafts with performance and input constraints, Adv Space Res, 2022, p S0273117722003386, https://doi.org/10.1016/j.asr.2022.04.061 Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D. and Rusu, A.A. Human-level control through deep reinforcement learning, Nature, 2015, 518, (7540), pp 529–533. https://doi.org/10.1038/nature14236 CrossRef Google Scholar PubMed

Silver, D., Lever, G. and Heess, N. Deterministic policy gradient algorithms, Proceedings of the 31st International Conference on International Conference on Machine Learning, 21–26, June 2014, 32, pp 387–395, Bejing, China.Google Scholar

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. Mastering the game of Go with deep neural networks and tree search, Nat, 2016, 529, (7587), pp 484–489. https://doi.org/10.1038/nature16961 CrossRef Google Scholar PubMed

Zhou, X., Zhang, H.B., Xie, L., Tang, G.J. and Bao, W.M. An improved solution method via the pole-transformation process for the maximum-cross range problem, Proc ImechE G: J Aerosp Eng, 2020, 234, (9), pp 1491–1506. https://doi.org/10.1177/0954410020914809 CrossRef Google Scholar

Phillips, T.H. A common aero vehicle model, description, and employment guide, www.dtic.Mil/matris/sbir041/srch/af031a.doc, 2013.Google Scholar

Zhou, X., He, R.Z., Zhang, H.B., Tang, G.J. and Bao, W.M. Sequential convex programming method using adaptive mesh refinement for entry trajectory planning problem, Aerosp Sci Technol, 2021, 109, p 106374. https://doi.org/10.1016/j.ast.2020.106374 CrossRef Google Scholar

Article contents

A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests