Reinforcement learning-adaptive fault-tolerant IGC method for a class of aircraft with non-affine characteristics and multiple uncertainties

Z. Wang; Y. T. Hao; J. L. Liu; Y. F. Bai; D. X. Yu

doi:10.1017/aer.2024.86

Reinforcement learning-adaptive fault-tolerant IGC method for a class of aircraft with non-affine characteristics and multiple uncertainties

Published online by Cambridge University Press: 05 November 2024

Z. Wang ,

Y. T. Hao

J. L. Liu ,

Y. F. Bai and

D. X. Yu

Show author details

Z. Wang: Affiliation:
Research Center for Unmanned System Strategy Development, Northwestern Polytechnical University, Xi’an, Shaanxi, China National Key Laboratory of Aerospace Flight Dynamics, Northwestern Polytechnical University, Xi’an, Shaanxi, China Northwest Institute of Mechanical and Electrical Engineering, Xianyang, China Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, Shaanxi, China
Y. T. Hao*: Affiliation:
Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, Shaanxi, China
J. L. Liu: Affiliation:
Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, Shaanxi, China
Y. F. Bai: Affiliation:
China Academy of Launch Vehicle Technology, Beijing, China
D. X. Yu: Affiliation:
School of Artificial Intelligence, OPtics and ElectroNlics (iOPEN), Northwestern Polytechnical University, Xi’an, Shaanxi, China
*: Corresponding author: Y. T. Hao; Email: yuting_hao10@163.com

Article contents

Abstract
Nomenclature
Greek symbol
Introduction
Problem formulation and preliminaries
Main results
Simulation
Conclusion
Competing interests
References

Rights & Permissions

Abstract

In this paper, a brand-new adaptive fault-tolerant non-affine integrated guidance and control method based on reinforcement learning is proposed for a class of skid-to-turn (STT) missile. Firstly, considering the non-affine characteristics of the missile, a new non-affine integrated guidance and control (NAIGC) design model is constructed. For the NAIGC system, an adaptive expansion integral system is introduced to address the issue of challenging control brought on by the non-affine form of the control signal. Subsequently, the hyperbolic tangent function and adaptive boundary estimation are utilised to lessen the jitter due to disturbances in the control system and the deviation caused by actuator failures while taking into account the uncertainty in the NAIGC system. Importantly, actor-critic is introduced into the control framework, where the actor network aims to deal with the multiple uncertainties of the subsystem and generate the control input based on the critic results. Eventually, not only is the stability of the NAIGC closed-loop system demonstrated using Lyapunov theory, but also the validity and superiority of the method are verified by numerical simulations.

Keywords

Non-affine integrated guidance and control Multiple uncertainties Actor-critic Adaptive fault-tolerant control

Type: Research Article
Information: The Aeronautical Journal , First View , pp. 1 - 23

DOI: https://doi.org/10.1017/aer.2024.86 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Nomenclature

LOS: line of sight
V: velocity
q: pitch rate
${n_L}$: normal accelerationl reference length
S: reference area
m: mass of the missile
${I_{yy}}$: moment of inertia around the pitch axis
${T_\alpha }$: turning rate time constantr range along the LOS
${V_r}$: projections of relative velocities along to the LOS
${V_\lambda }$: projections of relative velocities orthogonal to the LOS
${A_{Tr}}$: projections of target acceleration along LOS
${A_{T\lambda }}$: projections of target acceleration orthogonal to LOS
${x_M}$: x-coordinate of the missile
${y_M}$: y-coordinate of the missile
${z_M}$: z-coordinate of the missile
${x_T}$: x-coordinate of the target
${y_T}$: y-coordinate of the target
${z_T}$: z-coordinate of the target
STT: skid-to-turn
IGC: integrated guidance and control
FTIGC: fault-tolerant integrated guidance and control
NAIGC: non-affine integrated guidance and control
FTC: fault-tolerant control
RL: Reinforcement learning
RBFNN: radial basis function neural network
BS-FTNAIGC: backstepping fault-tolerant non-affine integrated guidance and control
ABE-FTNAIGC: adaptive boundary estimation fault-tolerant non-affine integrated guidance and control
RBF-FTNAIGC: radial basis function fault-tolerant non-affine integrated guidance and control

Greek symbol

$\alpha $: angle-of-attack
$\theta $: pitch angle
${\gamma _M}$: track angle
$\rho $: the density of air
$\lambda $: the LOS angle

1.0 Introduction

The integrated guidance and control (IGC) design method has gained significant attention since its initial proposal due to its ability to maximise the missile’s flight performance [Reference Guo, Wang, Hu and Guo1] and overall operational effectiveness. By leveraging the coupling relationship between the guidance and control systems, it combines the advantages of low design cost and robustness. Numerous research studies have been dedicated to exploring this method by scholars. Several approaches have been studied for the design of integrated guidance and control, including sliding mode control for designing sliding surfaces [Reference Wu, Lu and Wang2], optimal control methods [Reference Wang, Zhang, Lin and Li3], feedback linearisation [Reference Xu, Chen and Wang4], backstepping control method [Reference Hu, Wei and Wang5], adaptive control [Reference Jiang, qing Tian, yan Sun and ge Liang6], and active disturbance rejection control [Reference Zhao, Cao and Huang7]. During the flight of the missile, various failures, such as rudder surface failures, sensor failures, and other mechanical failures, are commonly encountered in the missile systems. Recent research works have focused on fault-tolerant control (FTC) challenges [Reference Wang, Yuan, Pan and Wei8–Reference Wang and Yuan10]. For instance, an adaptive barrier fast terminal sliding mode control method was proposed to mitigate actuator failures in unmanned aerial vehicles [Reference Najafi, Vu, Mobayen, Asad and Fekih11]. In multi-intelligent systems with node failures and switching topologies, a distributed adaptive fuzzy fault-tolerant control method has been suggested [Reference Zhao, Zhao and Che12]. While there are few FTC-related studies in IGC systems, one literature review has addressed elevator and rudder section failures within strict feedback IGC structures [Reference Wang and Yuan10]. Asghar et al. [Reference Ashrafifar and Jegarkandi13] considered a burned or broken tailplane failure and developed an IGC system for a ground-to-air missile. Zhao [Reference Zhao14] proposed a fault-tolerant control method for handling rudder surface effectiveness loss failures in the vehicle.

Overall, the fault-tolerant integrated guidance and control (FTIGC) design method has attracted extensive research efforts, and various approaches have been explored for the integrated guidance and control system design. Nevertheless, the fault-tolerant controls discussed above are based on the transformation of the vehicle model into an affine input form. The cases where each order of subsystem input to the system is fully non-affine is not considered. In reality, the system is more realistic represented by a non-affine projection. In addition, there is currently no fault-tolerant control design for the non-affine form of the IGC model, with only the literature [Reference Chen15] considering non-affine aerodynamic characteristics to build the IGC model of the STT missile. Undoubtedly, in practical engineering applications, many parameters of the missile flight and guidance system, such as torque, exhibit non-affine characteristics. Therefore, in this paper, we aim to address the aforementioned problem by developing a new NAIGC scheme for missiles that experience rapidly changing actuator failures and multiple uncertainties from different sources. We describe the missile dynamics and interceptor target kinematics as a non-affine non-linear IGC system, which is more consistent with practical engineering applications.

Reinforcement learning (RL) has gained significant attention as a learning control method due to its ability to deal with unknown uncertainties and has been extensively researched in recent years [Reference Liu, Li, Tong and Chen16–Reference Peng, Hu, Shi, Luo, Huang, Ghosh and Huang20]. In Ref. [Reference Yang, Modares, Wunsch and Yin21], RL was introduced to address the distributed leader-follower output synchronisation problem for linear heterogeneous systems with activities. Moreover, actor-critic structures are frequently employed in reinforcement learning for uncertain systems [Reference Fan, Yang and Ye22, Reference Hu, Li, Xue and Liu23]. The critic network receives information about the system from the task environment and gives a cost function to evaluate the control performance. Based on the cost function, the actor network is used to generate the next control policy for the actuator. Ouyang et al. [Reference Ouyang, Dong, Wei and Sun24] designed an actor-critic adaptive control method for tracking control of an uncertain elastic joint robot. The critical neural network was used to approximate the cost function, while the actor neural network was used to handle system uncertainty and generate control inputs for the actuator. Liu et al. [Reference Liu, Shan, Rong and Zheng25] proposed an incremental reinforcement learning control method with adaptive learning rate to improve the success rate of flight controllers. A distributed reinforcement learning guidance strategy under angle-of-attack constraints was investigated in Ref. [Reference Bohao, Xuman, Xiaofei, Yunjie and Guofei26]. Pei et al. [Reference Pei, Shao-ming, Jiang and De-fu27] used the deep deterministic policy gradient algorithm to integrate guidance control into a reinforcement learning framework to intercept targets using reinforcement learning-generated intelligences, and numerically verified the effectiveness and robustness of the method. In the literature [Reference Song, Luo, Zhao, Hu and Zhang28], the IGC system was modeled as a reinforcement learning process based on a three degrees of freedom motion model of the hypersonic vehicles in the longitudinal plane, and a proximal policy optimisation algorithm-based IGC system was designed. It can be seen that the performance of reinforcement learning actor-critic is satisfactory for controlling the vehicle or guidance system. The model of the IGC problem exhibits intricate nonlinear dynamics, encompassing non-linear relationships, non-affine terms and uncertain disturbances. Simultaneously, the IGC problem necessitates a high level of certainty in the control scheme, demanding precise system control in accordance with predetermined planning trajectories. The reinforcement learning actor-critic approach has yet to be investigated for its applicability when dealing with such complex models.

The sensor measurement bias, as well as the actuator effectiveness failure and bias failure generated during the flight, can disturb the attitude control system, which makes the controller design more challenging. In addition, external disturbances and structural uncertainties should be considered, which can also cause troubles during the controller design process. Inspired by the above, this paper will focus on the FTIGC problem for a class of non-affine forms with structural uncertainties, actuator failures and external disturbances in the NAIGC system. The main challenges of this paper are how to model the NAIGC system and how to deal with the non-affine problem and various unknown uncertainties and time-varying fault variations. By introducing an adaptive expansion integral system to deal with the non-simulation problem and fully combining the reinforcement learning actor-critic architecture and the approximation capability of radial basis function neural network (RBFNN), the unknown uncertainties and fault problems can be well handled by using bounded adaptive control techniques. Compared with existing results, the method proposed in this paper has the following contributions:

• To the best of the authors’ knowledge, the method applying actor-critic to the design of adaptive fault-tolerant NAIGC is proposed for the first time and can be an effective solution of solving multiple uncertainties.
• A new type of non-affine integrated guidance and control design model is established for a class of STT missile with actuator failures and multiple uncertainties and can be extended to other aircraft with non-affine structures.
• Benefiting from the combination of adaptive boundary, RBFNN, and actor-critic, the missile’s ability to respond to actuator failures and formulation and preliminaries of target manoeuvers problem can be greatly improved.

2.0 Problem formulation and preliminaries

2.1 Non-affine IGC model

Consider the following nonlinear longitudinal model of the missile where the gravity and the coupling between the longitudinal and lateral channel are neglected:

(1)

\begin{align}& \dot{\alpha }=q+{-0.5\rho V^{2}Sc_{x}(\alpha )\sin \alpha+0.5\rho V^{2}S\left(c_{z1}(\alpha )+c_{z2}(\alpha )M_{m}\right)\cos \alpha \over mV}\nonumber\\ & \dot{V}={0.5\rho V^{2}Sc_{x}(\alpha )\cos \alpha+0.5\rho V^{2}S\left(c_{z1}(\alpha )+c_{z2}(\alpha )M_{m}\right)\sin \alpha \over m}\nonumber\\ & \dot{q}={0.5\rho V^{2}Sl\left(c_{m1}(\alpha )+c_{m2}(\alpha )M_{m}+{c}_{m}^{\delta _{e}}\delta _{e}\right) \over I_{yy}}\\ & \dot{\theta }=q\nonumber\\ & \dot{n}_{L}={-n_{L}+Vq \over T_{\alpha }}\nonumber\\ & \gamma _{M}=\theta -\alpha \nonumber\end{align}

where c _x, c _z1, c _z2, c _m1, c _m2, c _m ^δ
_e denote the aerodynamic coefficients.

The kinematics of a planar missile intercept target can be described as [Reference Wang, Xiong, Wang, Song and Lai29]:

(2)

\begin{align}& \dot{r}=V_{r}\nonumber\\[2pt] & \dot{V}_{r}={{V}_{\lambda }^{2} \over r}+A_{Tr}-\sin \left(\lambda -\gamma _{M}\right)n_{L}\nonumber\\[3pt] & \dot{\lambda }={V_{\lambda } \over r}\\[3pt] & \dot{V}_{\lambda }=-{V_{\lambda }V_{r} \over r}+A_{T\lambda }-\cos \left(\lambda -\gamma _{M}\right)n_{L} \nonumber \end{align}

Considering the actuator fault, u(t) is defined as the actuator inputs. Then the output of the actuator fault is expressed as

(3)

\begin{align}\delta _{e}=\lambda _{\delta }\left(t\right)u\left(t\right)+d_{\delta }\left(t\right)\end{align}

where d _δ represents the error of the actuator malfunction. λ _δ represents the scale factor of the actuator gain fault. Assume that d _δ is a bounded unknown variable, λ _δ takes values in the interval [0,1]. According to the corollary of Ref. [Reference Wang and Yuan10], when $V_{\lambda }\rightarrow k_{0}\sqrt{r}$ , where k ₀ > 0 is a constant, the direct hit can be obtained. Thus by defining $\chi =V_{\lambda }-k_{0}\sqrt{r}$ , we can obtain the time derivative of χ.

(4)

\begin{align}\dot{\chi }=-{V_{\lambda }V_{r} \over r}+A_{T\lambda }-\cos \left(\lambda -\gamma _{M}\right)n_{L}-{k_{0}V_{r} \over 2\sqrt{r}} \end{align}

Meanwhile, according to the kinetic eqnarrays of n _L and q, we can obtain

(5)

\begin{align}& \dot{n}_{L}=-{n_{L} \over T_{\alpha }}+{V \over T_{\alpha }}q\nonumber\\[4pt] & \dot{q}={\rho V^{2}Sl\left(c_{m1}(\alpha )+c_{m2}(\alpha )M_{m}\right) \over 2I_{yy}}+{\rho V^{2}Sl{c}_{m}^{\delta _{e}} \over 2I_{yy}}\delta _{e} \end{align}

Define x ₁ = χ, x ₂ = n _L, x ₃ = q and

(6)

\begin{align}& f_{1}=-{V_{\lambda }V_{r} \over r}-{k_{0}V_{r} \over 2\sqrt{r}}, g_{1}=\cos \left(\lambda -\gamma _{M}\right)\left(1+\Delta b_{1}\right)x_{2}, d_{1}=A_{T\lambda }\nonumber\\[5pt] & f_{2}={-n_{L} \over T_{\alpha }}, g_{2}={V \over T_{\alpha }}\left(1+\Delta b_{2}\right)x_{3}, d_{3}={\rho V^{2}Sl{c}_{m}^{\delta _{e}} \over 2I_{yy}}\left(1+\Delta b_{3}\right)d_{\delta }\\[5pt] & f_{3}={\rho V^{2}Sl\left(c_{m1}(\alpha )+c_{m2}(\alpha )M_{m}\right) \over 2I_{yy}}, g_{3}={\rho V^{2}Sl{c}_{m}^{\delta _{e}} \over 2I_{yy}}\left(1+\Delta b_{3}\right)\lambda _{\delta }u \nonumber\end{align}

We can get the following non-affine system,

(7)

\begin{align} \dot{x}_{1} & =f_{1}\left(t,x_{1}\right)+g_{1}\left(t,x_{1},x_{2}\right)+d_{1}\left(t\right)+\Delta f_{1}\left(t,x_{1}\right)\nonumber\\ \dot{x}_{2} & =f_{2}\left(t,x_{1},x_{2}\right)+g_{2}\left(t,x_{1},x_{2},x_{3}\right)+\Delta f_{2}\left({\rm t},x_{1},x_{2}\right)\nonumber\\ \dot{x}_{3} & =f_{3}\left(t,x_{1},x_{2},x_{3}\right)+g_{3}\left(t,x_{1},x_{2},x_{3},u\right)+d_{3}\left(t\right)+\Delta f_{3}\left(t,x_{1},x_{2},x_{3}\right)\\ y & =x_{1} \nonumber \end{align}

where x _i $\in$ R ⁱ, i = 1, 2, 3 are system state variables. y = x ₁ is the system output. Δf _i and Δb _i, i = 1, 2, 3 are the uncertainty caused by the measurement error, in fact, Δb _i $\in$ [−0.5,+0.5]. Obviously the integrated design model of guidance control is a third-order non-affine system, and the relationship between the outer and inner loops is shown schematically in Fig. 1.

Figure 1. The relationship between the outer and inner loops of the guidance and control integration.

Remark 1. The non-affine IGC model established in this paper is more general, and the aerodynamic characteristics of the missile and the rate of change of deflection are considered in the form of non-affine functions, so that each subsystem contains non-affine inputs, which has great reference significance for practical engineering applications, but also makes the design of the controller more difficult.

The design goal of this paper is to design a class of RBFNN and actor-critic based adaptive controllers such that the control converges to zero in the presence of multiple factors such as actuator failure, simultaneous changes in unknown target acceleration, and coupled multi-source uncertainty, and that the relevant gain parameters in the controllers converge with bounds.

In this article, the following assumptions are necessary:

Lemma 1. [Reference Wang and Yuan30] The following inequality holds for any $\varepsilon$ > 0 and z $\in$ R, we have

(8)

\begin{align}0\leq | z| -z\tanh \left({z \over \varepsilon }\right)\leq \kappa \varepsilon \end{align}

where κ is a positive number satisfying κ = e ^−(κ+1), i.e., κ = $0.2785$

Lemma 2. [Reference Huang31] (Cauchy inequality) $\|\bullet\|$ is the Euclidean paradigm of the vector, i.e. $||x||=\sqrt{{\bf x}^{T}{\bf x}}$ . For $\forall {\bf x},{\bf y}\in R^{m}$ , the following inequality holds:

(9)

\begin{align}\left|\left|x^{T}y\right|\right|\leq \left|\left|x\right|\right|\left|\left|y\right|\right| \end{align}

Lemma 3. [Reference Xia, Lian, Su, Shen and Chen32] (Young inequality) Given normal numbers p and q that satisfy 1/p + 1/q = 1, to any x, y $\in$ and any $\varepsilon$ > 0, the following inequality is true:

(10)

\begin{align}\left| xy\right| \leq {\varepsilon ^{p} \over p}\left| x\right| ^{p}+{1 \over q\varepsilon ^{q}}\left| y\right| ^{q} \end{align}

Lemma 4. [Reference He and Dong33] The Lyapunov function V(t) with initial bounded condition V(0), if the derivative of V(t) satisfies

(11)

\begin{align}\dot{V}\left(t\right)\leq -C_{V}V\left(t\right)+E_{V} \end{align}

where C_V and E_V are positive constants, V(t) is bounded.

Lemma 5. [Reference Wang, Yuan, Pan and Che34] For any constant $\varepsilon$ > 0 and any variable z $\in$ R, the following relationship holds

(12)

\begin{align}\left| z\right| \lt {z^{2} \over \sqrt{z^{2}+\varepsilon ^{2}}}+\varepsilon \end{align}

Assumption 1. There exist positive constants $\underline{g}$ and $\bar{g}$ , the following inequality holds

(13)

\begin{align}\bar{g}\geq \left|\left|{\partial g_{i}\left(\bar{x}_{i+1}\right) \over \partial x_{j}}\right|\right|\geq \underline{g},i=1,\ldots, n.\quad j=1,\ldots, n+1. \end{align}

where $\bar{x}_{i+1}=[x_{1},\ldots, x_{i+1}],i=1,\ldots, n.x_{n+1}=u$

Remark 2. This assumption is introduced to make the whole system controllable.

2.2 Neural networks for approximation

In this paper, we require to approximate the system uncertainty and unknown cost functions. System uncertainty is estimated using RBFNN, which is a three-layer network. The first layer is called the input layer, the second layer is called the hidden layer. RBFNN generally contains only one hidden layer, and the weights of all neurons from the input to the hidden layer are 1. The third layer is called the output layer. In this paper, we use the following Gaussian function, denoted as $\phi$ (x), as the radial basis function: μ _j is the function centroid of the j-th node of the hidden layer, and σ _j is the width of the j-th node:

(14)

\begin{align}\phi \left(x\right)=\exp \left(-{\left|\left|x-\mu _{j}\right|\right|^{2} \over 2{\sigma }_{j}^{2}}\right) \end{align}

The final output is defined as

(15)

\begin{align}y=\frac{\sum\limits_{j=1}^{m}\,w_{j}\sum\limits_{i=1}^{n}\,\exp \left(-\frac{\left|\left|x_{i}-\mu _{j}\right|\right|^{2}}{2{\sigma }_{j}^{2}}\right)}{\sum\limits_{j=1}^{m}\,\sum\limits_{i=1}^{n}\,\exp \left(-\frac{\left|\left|x_{i}-\mu _{j}\right|\right|^{2}}{2{\sigma }_{j}^{2}}\right)}=\hat{W}^{T}\Phi \left(x\right) \end{align}

where x $\in$ R ⁿ and y $\in$ R are the input and output of RBFNN, respectively. $\hat{W}$ = [w ₁,…,w _n]^T $\in$ ^m denotes the output layer weight matrix, m represents the number of hidden nodes in the hidden layer. Φ(x) = [Φ ₁(x),…,Φ _m(x)]^T, where:

(16)

\begin{align}\Phi _{j}\left(x\right)={\sum\limits_{i=1}^{n}\,\exp \left(-\frac{\left|\left|x_{i}-\mu _{j}\right|\right|^{2}}{2{\sigma }_{j}^{2}}\right) \over \sum\limits_{j=1}^{m}\,\sum\limits_{i=1}^{n}\,\exp \left(-\frac{\left|\left|x_{i}-\mu _{j}\right|\right|^{2}}{2{\sigma }_{j}^{2}}\right)} \end{align}

It has been shown that for the smooth function, there exists an optimal weight [Reference Yu, Long, Chen and Wang35], such that

(17)

\begin{align}f\left(x\right)=W^{T}\Phi \left(x\right)+\varepsilon \left(x\right) \end{align}

where $\varepsilon$ (x) is the approximation error, which can be made arbitrarily small by increasing the number of nodes in the hidden layer.

Assumption 2. For the basic functions used for fitting in the actor-critic neural network mentioned later Ψ _a(Z _a) and Ψ _J(Z _c), they satisfy $\beta$ _a ≤ ||Ψ _a|| ≤ $\psi$ _a, $\beta$ _J ≤ ||Ψ _J|| ≤ $\psi$ _J and for the derivatives of Ψ _a(Z _a) and Ψ _J(Z _c), they satisfy λ _a ≤ || $\dot{\boldsymbol{\Psi}}$ _a|| ≤ γ _a, λ _J ≤ || $\dot\Psi$ _J|| ≤ γ _J. In addition, the derivatives of the estimation error and the estimation error are bounded when using neural network fitting approximation simultaneously, | $\varepsilon$ _i(Z)| ≤ ς_i,| $\dot{\varepsilon}$ _i(Z)| ≤ ξ _i, where ς_i,ξ _i are positive numbers.

3.0 Main results

To solve the problem of non-affine inputs in the system (7), an auxiliary integral system is introduced, and as an auxiliary control input, the augmented system is expressed as

(18)

\begin{align}\dot{x}_{1} & =f_{1}\left(t,x_{1}\right)+g_{1}\left(t,x_{1},x_{2}\right)+d_{1}\left(t\right)+\Delta f_{1}\nonumber\\ \dot{x}_{2} & =f_{2}\left(t,x_{1},x_{2}\right)+g_{2}\left(t,x_{1},x_{2},x_{3}\right)+\Delta f_{2}\nonumber\\ \dot{x}_{3} & =f_{3}\left(t,x_{1},x_{2},x_{3}\right)+g_{3}\left(t,x_{1},x_{2},x_{3},u\right)+d_{3}\left(t\right)+\Delta f_{3}\\ \dot{u} & =u_{f} \nonumber \end{align}

Remark 3. By adding an auxiliary integration system, the original third-order non-affine input system is transformed into a fourth-order affine input system, which effectively overcomes the non-affine input problem in the system (7).

The backpropagation design process, influenced by the non-affine input, consists of four steps, and the actual control is given in Step 4, and the control box is shown in Fig. 2.

Figure 2. Integrated guidance and control design block diagram.

3.1 Design steps of the reinforcement learning adaptive fault-tolerant IGC method

Define the error variables as follows

(19)

\begin{align}z_{1}\left(t\right) & =x_{1}\left(t\right)\nonumber\\[2pt] z_{2}\left(t\right) & =g_{1}\left(t\right)-g_{1d}\\[2pt] z_{3}\left(t\right) & =g_{2}\left(t\right)-g_{2d}\nonumber\\[2pt] z_{4}\left(t\right) & =g_{3}\left(t\right)-g_{3d} \nonumber \end{align}

where g _id, i = 1, 2, 3 is the filtered signal of the virtual control law for the i-th subsystem. We introduce a new variable g _ic obtained from

(20)

\begin{align}\tau _{i}\dot{g}_{id}+g_{id}=g_{ic},g_{id}\left(0\right)=g_{ic}\left(0\right) \end{align}

Remark 4. By introducing dynamic surfaces, the derivatives of the controllers in the simulation programme can be transformed, which in turn reduces the complexity of the operations.

The boundary layer error is defined as

(21)

\begin{align}y_{i}=g_{id}-g_{ic} \end{align}

where g _ic is the virtual control law designed for the i-th subsystem. 0 < τ _i < 1, i = 1, 2, 3 is the filtering time constant to be designed. Then combining Equations (20) and (21) we can obtain that

(22)

\begin{align}\dot{y}_{i}=\dot{g}_{id}-\dot{g}_{ic}=-{y_{i} \over \tau _{i}}-\dot{g}_{ic} \end{align}

Furthermore, for the unknown nonlinear function, we define

(23)

\begin{align}\theta _{i}=\max \left\{\underset{t\geq 0}{\sup }\parallel W_{i}\parallel \colon i=1,2,3\right\} \end{align}

Let $\hat{\theta}$ be the estimations of θ. The corresponding estimation errors are defined as $\tilde{\theta}$ = $\hat{\theta}$ − θ.

3.1.1 Step 1

Define D ₁ = sup_{t ≥ 0}||d ₁(t)||. Denote $\hat{D}$ ₁ as the estimate of D ₁. Moreover, assuming that the disturbance error $\bar{\varepsilon }_{{D_{1}}}$ estimated by the tanh function bounder is bounded:

(24)

\begin{align}\left| d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right| \leq \bar{\varepsilon }_{{D_{1}}} \end{align}

where $\varepsilon$ _D
₁ > 0 is a parameter to be designed. Combining Equations (18) and (21) and the first formula of Equation (19), we can get that

(25)

\begin{align}\dot{z}_{1}=\dot{x}_{1}=f_{1}+z_{2}+g_{1c}+y_{1}+d_{1}+\Delta f_{1} \end{align}

Hence, it follows that

(26)

\begin{align}z_{1}\dot{z}_{1}=z_{1}\left(f_{1}+z_{2}+g_{1c}+y_{1}+d_{1}+\Delta f_{1}\right) \end{align}

RBFNN is introduced to approximate the nonlinearity Δf ₁ in Equation (18). Obviously Δf ₁ = W ₁ ^T Φ ₁(x ₁) + $\varepsilon$ _Δ
₁(₁), $\varepsilon$ _Δ
₁(x ₁) is the RBFNN estimation error with an upper bound $\varepsilon$ _m1.

Based on Lemma 2, Lemma 3 and Lemma 5, we can obtain that

(27)

\begin{align}z_{1}\Delta f_{1} & =z_{1}{W}_{1}^{T}\Phi _{1}\left(x_{1}\right)+z_{1}\varepsilon _{{\Delta _{1}}}\left(x_{1}\right)\nonumber\\ & \leq \left|\left|{W}_{1}^{T}\right|\right|\left|\left|z_{1}\Phi _{1}\left(x_{1}\right)\right|\right|+\left|\left|z_{1}\right|\right|\varepsilon _{m1}\leq \theta _{1}z_{1}\bar{\Phi }_{11}+{1 \over 2}{z}_{1}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2}\\ \bar{\Phi }_{11} & =\left({z_{1}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right) \over \sqrt{{z}_{1}^{2}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right)+{\varepsilon }_{11}^{2}}}+\varepsilon _{11}\right) \nonumber \end{align}

where $\varepsilon$ ₁₁ > 0 is a parameter to be designed.

Remark 5. In this paper, in order to reduce the computational burden, so the upper bound of the neural network weight parametrisation is used for adaptive compensation, or it can be solved directly with a multi-dimensional vector without the upper bound, and the final controller as well as the form of the adaptive update law are similar to this method.

The virtual controller can be designed as follows

(28)

\begin{align}g_{1c}=-k_{1}z_{1}-\hat{\theta }_{1}\bar{\Phi }_{11}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)-f_{1} \end{align}

where k ₁ > 0 is a parameter to be designed.

Combining Equations (26)–(28), we can rewrite Equation (26) as

(29)

\begin{align} z_{1}\dot{z}_{1}\leq -k_{1}{z}_{1}^{2}+z_{1}\left(z_{2}+y_{1}\right)+z_{1}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)-z_{1}\tilde{\theta }_{1}\bar{\Phi }_{11}+{1 \over 2}{z}_{1}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2} \end{align}

Define the adaptive update law of $\hat{D}$ ₁ is

(30)

\begin{align}\dot{\hat{D}}_{1}=\eta _{{D_{1}}}z_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)-\eta _{{D_{1}}}\sigma _{{D_{1}}}\hat{D}_{1} \end{align}

where η _D
₁, σ _D
₁ > 0 are parameters to be designed.

3.1.2 Step 2

We define the cost function as follows

(31)

\begin{align}J\left(t\right)={\int }_{0}^{T}\,c(t){\rm d}t={\int }_{0}^{T}\,(g_{1}-g_{1d})^{2}{\rm d}t \end{align}

i.e, c = $\dot{\!J}$ .

Critic network: Due to the non-deterministic nature of the cost function, the neural network can be used to estimate as

(32)

\begin{align}J(t)={\Theta }_{J}^{T}\Psi _{J}\left(Z_{c}\right)+\varepsilon _{J} \end{align}

where Θ _J $\in$ R ^l
_c is the ideal critic network weight. l _c represents the number of hidden nodes. $\varepsilon$ _J represents the estimation error. Z _c = [z ₂] is the input to the critic neural network. Define the estimate of the cost function as

(33)

\begin{align}\hat{\!J}\left(t\right)={\hat{\Theta }}_{J}^{T}\Psi _{J}\left(Z_{c}\right) \end{align}

where $\hat{\Theta}$ _J is the actual critic network weight and we have $\tilde{\Theta}$ _J = $\hat{\Theta}$ _J − Θ _J with $\tilde{\Theta}$ _J being the critic neural network weight error. Then we define a critic error as

(34)

\begin{align}\varepsilon _{\textit{critic}}=c\left(t\right)+{\hat{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right) \end{align}

The function of the critic error can be designed as

(35)

\begin{align} E_{\textit{ritic}}={1 \over 2}{\varepsilon }_{\textit{critic}}^{2} \end{align}

In the framework of σ- modification, using the gradient descent method, we can obtain the update law of $\hat{\Theta}$ _J as

(36)

\begin{align}\dot{\hat{\Theta }}_{J}=-\eta _{J}{\partial E_{\textit{critic}} \over \partial \hat{\Theta }_{J}}=-\eta _{J}{\partial E_{\textit{critic}} \over \partial \varepsilon _{\textit{critic}}}\cdot {\partial \varepsilon _{\textit{critic}} \over \partial \hat{\Theta }_{J}}=-\eta _{J}\varepsilon _{\textit{critic}}\dot{\Psi }_{J}\left(Z_{c}\right)-\sigma _{J}\eta _{J}\hat{\Theta }_{J} \end{align}

Actor network: with Equation (18), Equation (21) and the second formula of Equation (19), we can get that

(37)

\begin{align}\dot{z}_{2} & =\dot{g}_{1}-\dot{g}_{1d}\nonumber\\[4pt] & ={\partial g_{1} \over \partial x_{1}}\dot{x}_{1}+{\partial g_{1} \over \partial x_{2}}\dot{x}_{2}-\dot{g}_{1d}\\[4pt] & ={\partial g_{1} \over \partial x_{1}}\left(f_{1}+g_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{1} \over \partial x_{2}}\left(f_{2}+\Delta f_{2}+z_{3}+y_{2}+g_{2c}\right)-\dot{g}_{1d}\nonumber \end{align}

Thus

(38)

\begin{align}z_{2}\dot{z}_{2}=z_{2}\left({\partial g_{1} \over \partial x_{1}}\left(f_{1}+_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{1} \over \partial x_{2}}\left(f_{2}+z_{3}+g_{2c}+y_{2}+\Delta f_{2}\right)-\dot{x}_{1d}\right) \end{align}

By using Lemmas 2, 3 and 5, we have

(39)

\begin{align}z_{2}\Delta f_{1} & =z_{2}{W}_{1}^{T}\Phi _{1}\left(x_{1}\right)+z_{2}\varepsilon _{{\Delta _{1}}}\left(x_{1}\right)\nonumber\\[4pt] & \leq \left|\left|{W}_{1}^{T}\right|\right|\left|\left|z_{2}\Phi _{1}\left(x_{1}\right)\right|\right|+\left|\left|z_{2}\right|\right|\varepsilon _{m1}\leq \theta _{1}z_{2}\bar{\Phi }_{12}+{1 \over 2}{z}_{2}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2}\\[4pt] \bar{\Phi }_{12} & =\left({z_{2}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right) \over \sqrt{{z}_{2}^{2}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right)+{\varepsilon }_{12}^{2}}}+\varepsilon _{12}\right) \nonumber \end{align}

where $\varepsilon$ ₁₂ > 0 is a parameter to be designed. Based on the approximation of the neural network, the uncertainty Δf ₂ in Equation (18) is approximated as

(40)

\begin{align}\Delta f_{2}\left(t,x_{1},x_{2}\right)={\Theta }_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\varepsilon _{a}\left(Z_{a}\right) \end{align}

where Θ _a $\in$ R ^l
_a is the ideal critic network weight with l _a denoting the number of hidden nodes, Z _a = [x ₁,x ₂]^T is the input to the actor neural network, and $\varepsilon$ _a(Z _a) denotes the function reconstruction error. Additionally, $\hat{\Theta}$ _a is the actual actor network weight, and $\tilde{\Theta}$ _a = $\hat{\Theta}$ _a − Θ _a, $\tilde{\Theta}$ _a is the actor neural network weight error. Then we design a actor error as

(41)

\begin{align}\varepsilon _{\textit{actor}}={\partial g_{1} \over \partial x_{2}}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\, \end{align}

We define the actor error function as

(42)

\begin{align}E_{\textit{actor}}={1 \over 2}{\varepsilon }_{\textit{actor}}^{2} \end{align}

According to the gradient descent method, we can get the update law of $\hat{\Theta}$ _a as

(43)

\begin{align}\dot{\hat{\Theta }}_{a}=-\eta _{a}{\partial E_{\textit{actor}} \over \partial \hat{\Theta }_{a}}=-\eta _{a}{\partial E_{\textit{actor}} \over \partial \varepsilon _{\textit{actor}}}\cdot {\partial \varepsilon _{\textit{actor}} \over \partial \hat{\Theta }_{a}}=-{\partial g_{1} \over \partial x_{2}}\eta _{a}\varepsilon _{\textit{actor}}\Psi _{a}\left(Z_{a}\right) \end{align}

However, since $\tilde{\Theta}$ _a is unknown, so that we let $\hat{\Theta}$ _a replace $\tilde{\Theta}$ _a. Substituting Equation (41) into Equation (43) yields

(44)

\begin{align}\dot{\hat{\Theta }}_{a}=-{\partial _{1} \over \partial x_{2}}\eta _{a}\left({\partial g_{1} \over \partial x_{2}}{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\,\right)\Psi _{a}\left(Z_{a}\right) \end{align}

By introducing σ correction, Equation (44) can be rewritten as

(45)

\begin{align}\dot{\hat{\Theta }}_{a}=-{\partial g_{1} \over \partial x_{2}}\eta _{a}\left({\partial g_{1} \over \partial x_{2}}{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\,\right)\Psi _{a}\left(Z_{a}\right)-\sigma _{a}\eta _{a}\hat{\Theta }_{a} \end{align}

We design the following virtual controller

(46)

\begin{align}g_{2c}={1 \over \dfrac{\partial g_{1}}{\partial x_{2}}}\left(\begin{array}{l} -k_{2}z_{2}-{\partial g_{1} \over \partial x_{1}}\left(f_{1}+g_{1}+\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)+\hat{\theta }_{1}\bar{\Phi }_{12}\right)\\[4pt] -{\partial g_{1} \over \partial x_{2}}\left({\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+f_{2}\right)-{y_{1} \over \tau _{1}} \end{array}\right) \end{align}

where k ₂ > 0 is a parameter to be designed. With the aid of Equations (38)–(46) we know that

(47)

\begin{align} z_{2}\dot{z}_{2}\leq & -k_{2}{z}_{2}^{2}+{\partial g_{1} \over \partial x_{1}}z_{2}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)-{\partial g_{1} \over \partial x_{1}}z_{2}\tilde{\theta }_{1}\bar{\Phi }_{12}+{\partial _{1} \over \partial x_{2}}\left(z_{3}+y_{2}\right)\nonumber\\ & -{\partial g_{1} \over \partial x_{2}}z_{2}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+{\partial g_{1} \over \partial x_{2}}z_{2}\varepsilon _{a}\left(Z_{a}\right)+{1 \over 2}{z}_{2}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2} \end{align}

3.1.3 Step 3

Define D ₃ = sup_{t ≥ 0}||d ₃(t)||, $\hat{D}$ ₃ is the estimate of D ₃. Furthermore, assuming that the disturbance error $\bar{\varepsilon }_{{D_{3}}}$ estimated by the tanh function bounder is bounded

(48)

\begin{align}\left| d_{3}\left(t\right)-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right| \leq \bar{\varepsilon }_{{D_{3}}} \end{align}

where $\varepsilon$ _D
₃ > 0 is a parameter to be designed. In view of Equations (18), (21) and the third formula of Equation (19), we can get that

(49)

\begin{align}\dot{z}_{3} & =\dot{g}_{2}-\dot{g}_{2d}={\partial g_{2} \over \partial x_{1}}\dot{x}_{1}+{\partial g_{2} \over \partial x_{2}}\dot{x}_{2}+{\partial g_{2} \over \partial x_{3}}\dot{x}_{3}-\dot{g}_{2d}\nonumber\\ & ={\partial g_{2} \over \partial x_{1}}\left(f_{1}+g_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{2} \over \partial x_{2}}\left(f_{2}+g_{2}+\Delta f_{2}\right)\\ & +{\partial g_{2} \over \partial x_{3}}\left(f_{3}+z_{4}+g_{3c}+y_{3}+d_{3}+\Delta f_{3}\right)-\dot{g}_{2d}\nonumber \end{align}

Thus

(50)

\begin{align}z_{3}\dot{z}_{3}=z_{3}\left(\begin{array}{l} {\partial g_{2} \over \partial x_{1}}\left(f_{1}+g_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{2} \over \partial x_{2}}\left(f_{2}+g_{2}+\Delta f_{2}\right)-\dot{g}_{2d}\\[5pt] +{\partial g_{2} \over \partial x_{3}}\left(f_{3}+z_{4}+g_{3c}+y_{3}+d_{3}+\Delta f_{3}\right) \end{array}\right) \end{align}

RBFNN is introduced to approximate the nonlinearity Δf ₃ in Equation (18). Ideally Δf ₃ = W ₃ ^T Φ ₃(x ₁,x ₂,x ₃) + $\varepsilon$ _Δ
₃(x ₁,x ₂,x ₃), and $\varepsilon$ _Δ
₃(x ₁,x ₂,x ₃) is the RBFNN estimation error with an upper bound $\varepsilon$ _m3, then combining Lemmas 2, 3 and 5 yields

(51)

\begin{align}z_{3}\Delta f_{1} & =z_{3}{W}_{1}^{T}\Phi _{1}\left(x_{1}\right)+z_{3}\varepsilon _{{\Delta _{1}}}\left(x_{1}\right)\nonumber\\ & \leq \left|\left|{W}_{1}^{T}\right|\right|\left|\left|z_{3}\Phi _{1}\left(x_{1}\right)\right|\right|+\left|\left|z_{3}\right|\right|\varepsilon _{m1}\leq \theta _{1}z_{3}\bar{\Phi }_{13}+{1 \over 2}{z}_{3}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2}\nonumber\\ \bar{\Phi }_{13} & =\left({z_{3}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right) \over \sqrt{{z}_{3}^{2}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right)+{\varepsilon }_{13}^{2}}}+\varepsilon _{13}\right)\nonumber\\ z_{3}\Delta f_{3} & =z_{3}{W}_{3}^{T}\Phi _{3}\left(x_{1},x_{2},x_{3}\right)+z_{3}\varepsilon _{{\Delta _{1}}}\left(x_{1},x_{2},x_{3}\right)\\ & \leq \left|\left|{W}_{3}^{T}\right|\right|\left|\left|z_{3}\Phi _{3}\left(x_{1},x_{2},x_{3}\right)\right|\right|+\left|\left|z_{3}\right|\right|\varepsilon _{m3}\leq \theta _{3}z_{3}\bar{\Phi }_{33}+{1 \over 2}{z}_{3}^{2}+{1 \over 2}{\varepsilon }_{m3}^{2}\nonumber\\ \bar{\Phi }_{33} & =\left({z_{3}{\Phi }_{3}^{T}\left(x_{1},x_{2},x_{3}\right)\Phi _{3}\left(x_{1},x_{2},x_{3}\right) \over \sqrt{{z}_{3}^{2}{\Phi }_{3}^{T}\left(x_{1},x_{2},x_{3}\right)\Phi _{3}\left(x_{1},x_{2},x_{3}\right)+{\varepsilon }_{33}^{2}}}+\varepsilon _{33}\right) \nonumber \end{align}

where $\varepsilon$ ₁₃ > 0, $\varepsilon$ ₃₃ > 0 are parameters to be designed. Design the following control signals

(52)

\begin{align}g_{3c}={1 \over \frac{\partial g_{2}}{\partial x_{3}}}\left(\begin{array}{l} -k_{3}z_{3}-{\partial g_{2} \over \partial x_{1}}\left(f_{1}+g_{1}+\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)+\hat{\theta }_{1}\bar{\Phi }_{13}\right)\\[4pt] -{\partial g_{2} \over \partial x_{2}}\left(f_{2}+g_{2}+{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right)-{y_{2} \over \tau _{2}}\\[8pt] -{\partial g_{2} \over \partial x_{3}}\left(f_{3}+\hat{\theta }_{3}\bar{\Phi }_{33}+\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right) \end{array}\right) \end{align}

where k ₃ > 0 is a parameter to be designed.

Combining Equations (50)–(52), it can be known that

(53)

\begin{align}z_{3}\dot{z}_{3} & \leq -k_{3}{z}_{3}^{2}+{\partial g_{2} \over \partial x_{3}}z_{3}\left(z_{4}+y_{3}\right)+{\partial g_{2} \over \partial x_{1}}z_{3}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)\nonumber\\[3pt] & -{\partial g_{2} \over \partial x_{1}}z_{3}\tilde{\theta }_{1}\bar{\Phi }_{13}-{\partial g_{2} \over \partial x_{2}}_{3}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+{\partial g_{2} \over \partial x_{2}}z_{3}\varepsilon _{a}\left(Z_{a}\right)-{\partial g_{2} \over \partial x_{3}}z_{3}\tilde{\theta }_{3}\bar{\Phi }_{33}\\[3pt] & +{\partial g_{2} \over \partial x_{3}}z_{3}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)+{z}_{3}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2}+{1 \over 2}{\varepsilon }_{m3}^{2}\nonumber \end{align}

Define the adaptive update law of $\hat{D}$ ₁ as

(54)

\begin{align}\dot{\hat{D}}_{3}=\eta _{{D_{3}}}{\partial g_{2} \over \partial x_{3}}z_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)-\eta _{{D_{3}}}\sigma _{{D_{3}}}\hat{D}_{3} \end{align}

where η _D
₃, σ _D
₃ > 0 are parameters to be designed.

3.1.4 Step 4

In view of eqnarray Equations (18), (21) and the forth formula of Equation (19), we have

(55)

\begin{align}\dot{z}_{4} & =\dot{g}_{3}-\dot{g}_{3d}\nonumber\\ & ={\partial g_{3} \over \partial x_{1}}\dot{x}_{1}+{\partial g_{3} \over \partial x_{2}}\dot{x}_{2}+{\partial g_{3} \over \partial x_{3}}\dot{x}_{3}+{\partial g_{3} \over \partial u}\dot{u}-\dot{g}_{3d}\nonumber\\ & ={\partial g_{3} \over \partial x_{1}}\left(f_{1}+g_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{3} \over \partial x_{2}}\left(f_{2}+g_{2}+\Delta f_{2}\right)\\ & +{\partial g_{3} \over \partial x_{3}}\left(f_{3}+g_{3}+d_{3}+\Delta f_{3}\right)+{\partial g_{3} \over \partial u}u_{f}-\dot{g}_{3d}\nonumber \end{align}

Thus

(56)

\begin{align}z_{4}\dot{z}_{4} & =z_{4}\left[ {\partial g_{3} \over \partial x_{1}}\left(f_{1}+g_{1}+d_{1}+\Delta f_{1}\right)+{\partial g_{3} \over \partial x_{2}}\left(f_{2}+g_{2}+\Delta f_{2}\right) +{\partial g_{3} \over \partial x_{3}}\left(f_{3}+g_{3}+d_{3}+\Delta f_{3}\right)+{\partial g_{3} \over \partial u}u_{f}-\dot{g}_{3d}\right] \end{align}

With the aid of Lemmas 2, 3 and 5, it is obvious that

(57)

\begin{align}z_{4}\Delta f_{1} & ={z}_{4}{W}_{1}^{T}\Phi _{1}\left(x_{1}\right)+z_{4}\varepsilon _{{\Delta _{1}}}\left(x_{1}\right)\nonumber\\ & \leq \left|\left|{W}_{1}^{T}\right|\right|\left|\left|z_{4}\Phi _{1}\left(x_{1}\right)\right|\right|+\left|\left|z_{4}\right|\right|\varepsilon _{m1}\leq \theta _{1}z_{4}\bar{\Phi }_{14}+{1 \over 2}{z}_{4}^{2}+{1 \over 2}{\varepsilon }_{m1}^{2}\nonumber\\ \bar{\Phi }_{14} & =\left({z_{4}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right) \over \sqrt{{z}_{4}^{2}{\Phi }_{1}^{T}\left(x_{1}\right)\Phi _{1}\left(x_{1}\right)+{\varepsilon }_{14}^{2}}}+\varepsilon _{14}\right)\nonumber\\ z_{4}\Delta f_{3} & =z_{4}{}_{3}^{T}\Phi _{3}\left(x_{1},x_{2},x_{3}\right)+z_{4}\varepsilon _{{\Delta _{3}}}\left(x_{1},x_{2},x_{3}\right)\\ & \leq \left|\left|{W}_{3}^{T}\right|\right|\left|\left|z_{4}\Phi _{3}\left(x_{1},x_{2},x_{3}\right)\right|\right|+\left|\left|z_{4}\right|\right|\varepsilon _{m3}\leq \theta _{3}z_{4}\bar{\Phi }_{34}+{1 \over 2}{z}_{4}^{2}+{1 \over 2}{\varepsilon }_{m3}^{2}\nonumber\\ \bar{\Phi }_{34} & =\left({z_{4}{\Phi }_{3}^{T}\left(x_{1},x_{2},x_{3}\right)\Phi _{3}\left(x_{1},x_{2},x_{3}\right) \over \sqrt{{z}_{4}^{2}{\Phi }_{3}^{T}\left(x_{1},x_{2},x_{3}\right)\Phi _{3}\left(x_{1},x_{2},x_{3}\right)+{\varepsilon }_{34}^{2}}}+\varepsilon _{34}\right) \nonumber \end{align}

where $\varepsilon$ ₁₄ > 0, $\varepsilon$ ₃₄ > 0 are parameters to be designed. Select the final actual control law u _f as

(58)

\begin{align}u_{f}={1 \over \frac{\partial g_{3}}{\partial u}}\left(\begin{array}{l} -{\partial g_{3} \over \partial x_{1}}\left(f_{1}+g_{1}+\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)+\hat{\theta }_{1}\bar{\Phi }_{14}\right)-{\partial g_{3} \over \partial x_{2}}\left(f_{2}+g_{2}+{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right)\\[9pt] -{\partial g_{3} \over \partial x_{3}}\left(f_{3}+g_{3}+\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)+\hat{\theta }_{3}\bar{\Phi }_{34}\right)-k_{4}z_{4}-{y_{3} \over \tau _{3}} \end{array}\right) \end{align}

where k ₄ > 0 is a positive parameter to be designed.

In view of Equations (56)–(58), it can be known that

(59)

\begin{align}z_{4}\dot{z}_{4} & \leq -k_{4}{z}_{4}^{2}+{z}_{4}^{2}-{\partial g_{3} \over \partial x_{1}}z_{4}\tilde{\theta }_{1}\bar{\Phi }_{14}-{\partial g_{3} \over \partial x_{2}}z_{4}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\nonumber\\[5pt] & +{\partial g_{3} \over \partial x_{2}}z_{4}\varepsilon _{a}\left(Z_{a}\right)-{\partial g_{3} \over \partial x_{3}}z_{4}\tilde{\theta }_{3}\bar{\Phi }_{34}+{1 \over 2}{\varepsilon }_{m1}^{2}+{1 \over 2}{\varepsilon }_{m3}^{2}\\[5pt] & +{\partial g_{3} \over \partial x_{1}}z_{4}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)+{\partial g_{3} \over \partial x_{3}}z_{4}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)\nonumber \end{align}

Last but not least, we select the following adaptive update laws of $\hat{\theta}$ ₁ and $\hat{\theta}$ ₃

(60)

\begin{align}\dot{\hat{\theta }}_{1} & =\eta _{1}\left(z_{1}\bar{\Phi }_{11}+{\partial g_{1} \over \partial x_{1}}z_{2}\bar{\Phi }_{12}+{\partial g_{2} \over \partial x_{1}}z_{3}\bar{\Phi }_{13}+{\partial g_{3} \over \partial x_{1}}z_{4}\bar{\Phi }_{14}\right)-\eta _{1}\sigma _{1}\hat{\theta }_{1}\nonumber\\ \dot{\hat{\theta }}_{3} & =\eta _{3}\left({\partial g_{2} \over \partial x_{3}}z_{3}\bar{\Phi }_{33}+{\partial g_{3} \over \partial x_{3}}z_{4}\bar{\Phi }_{34}\right)-\eta _{3}\sigma _{3}\hat{\theta }_{3} \end{align}

where η ₁, η ₃ > 0 are the gains of the adaptive update laws and σ ₁, σ ₃ > 0 are parameters to be designed.

3.2 Analysis of stability

Theorem 1. Considering the NAIGC system (7) in the presence of actuator faults and unknown uncertainties, the controller (58), the parameter update laws (30), (54), the gradient descent update laws (36), (45) and the adaptive update law (60). Suppose that Assumptions 1, 2 are satisfied, and the error of the hyperbolic tangent function estimation disturbance is bounded. Then the following conclusions hold

• The output guidance strategy of the system eventually converges to near zero, i.e., precision guided interception can be achieved.
• The boundedness of all signals can be guaranteed and the tracking error converges to zero.

We construct the Lyapunov function

(61)

\begin{align}V={\sum }_{i=1}^{4}\,{1 \over 2}{z}_{i}^{2}+{\sum }_{i=1}^{3}\,{1 \over 2}{y}_{i}^{2}+\sum _{i=1,3}\,{1 \over 2\eta _{i}}{\tilde{\theta }}_{i}^{2}+\sum _{i=1,3}\,{1 \over 2\eta _{{D_{i}}}}{\tilde{D}}_{i}^{2}+{1 \over 2}{\tilde{\Theta }}_{a}^{T}{\eta }_{a}^{-1}\tilde{\Theta }_{a}+{1 \over 2}{\tilde{\Theta }}_{J}^{T}{\eta }_{J}^{-1}\tilde{\Theta }_{J} \end{align}

Combining Equations (22), (29), (53), (59) we can take the derivative of V(t) as

(62)

\begin{align}\dot{V} & =z_{1}\dot{z}_{1}+z_{2}\dot{z}_{2}+z_{3}\dot{z}_{3}+z_{4}\dot{z}_{4}+y_{1}\dot{y}_{1}+y_{2}\dot{y}_{2}+y_{3}\dot{y}_{3}+{1 \over 2}{z}_{1}^{2}+{1 \over 2}{z}_{2}^{2}+{z}_{3}^{2}+{z}_{4}^{2}\nonumber\\[3pt] & +{1 \over \eta _{{D_{1}}}}\tilde{D}_{1}\dot{\hat{D}}_{1}+{1 \over \eta _{{D_{3}}}}\tilde{D}_{3}\dot{\hat{D}}_{3}+{1 \over \eta _{1}}\tilde{\theta }_{1}\dot{\hat{\theta }}_{1}+{1 \over \eta _{3}}\tilde{\theta }_{3}\dot{\hat{\theta }}_{3}+{\tilde{\Theta }}_{a}^{T}{\eta }_{a}^{-1}\dot{\hat{\Theta }}_{a}+{\tilde{\Theta }}_{J}^{T}{\eta }_{J}^{-1}\dot{\hat{\Theta }}_{J}\nonumber\\[3pt] & \leq -k_{1}{z}_{1}^{2}-k_{2}{z}_{2}^{2}-k_{3}{z}_{3}^{2}-k_{4}{z}_{4}^{2}-\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right)\left({\tilde{\Theta }}_{{\rm a}}^{T}\Psi _{a}\left(Z_{a}\right)-\varepsilon _{a}\right)\nonumber\\[3pt] & +z_{1}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)+\left({\partial g_{1} \over \partial x_{1}}z_{2}+{\partial g_{2} \over \partial x_{1}}z_{3}+{\partial g_{3} \over \partial x_{1}}z_{4}\right)\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)\nonumber\\[3pt] & +{\partial g_{2} \over \partial x_{3}}z_{3}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)+{\partial g_{3} \over \partial x_{3}}z_{4}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)\\[3pt] & -\left(z_{1}\bar{\Phi }_{11}+{\partial g_{1} \over \partial x_{1}}z_{2}\bar{\Phi }_{12}+{\partial g_{2} \over \partial x_{1}}z_{3}\bar{\Phi }_{13}+{\partial g_{3} \over \partial x_{1}}z_{4}\bar{\Phi }_{14}\right)\tilde{\theta }_{1}-\left({\partial g_{2} \over \partial x_{3}}z_{3}\bar{\Phi }_{33}+{\partial g_{3} \over \partial x_{3}}z_{4}\bar{\Phi }_{34}\right)\tilde{\theta }_{3}\nonumber\\[3pt] & +z_{1}\left(z_{2}+y_{1}\right)+{\partial g_{1} \over \partial x_{2}}z_{2}\left(z_{3}+y_{2}\right)+{\partial g_{2} \over \partial x_{3}}z_{3}\left(z_{4}+y_{3}\right)+{\tilde{\Theta }}_{a}^{T}{\eta }_{a}^{-1}\dot{\hat{\Theta }}_{a}+{\tilde{\Theta }}_{J}^{T}{\eta }_{J}^{-1}\dot{\hat{\Theta }}_{J}+2{\varepsilon }_{m1}^{2}+{\varepsilon }_{m3}^{2}\nonumber\\[3pt] & -{{y}_{1}^{2} \over \tau _{1}}-\dot{g}_{1c}y_{1}-{{y}_{2}^{2} \over \tau _{2}}-\dot{g}_{2c}y_{2}-{{y}_{3}^{2} \over \tau _{3}}-\dot{g}_{3c}y_{3}+{1 \over \eta _{{D_{1}}}}\tilde{D}_{1}\dot{\hat{D}}_{1}+{1 \over \eta _{{D_{3}}}}\tilde{D}_{3}\dot{\hat{D}}_{3}+{1 \over \eta _{1}}\tilde{\theta }_{1}\dot{\hat{\theta }}_{1}+{1 \over \eta _{3}}\tilde{\theta }_{3}\dot{\hat{\theta }}_{3} \nonumber \end{align}

According to Lemma 1, we can obtain that

(63)

\begin{align}z_{1}d_{1}\leq \left| z_{1}d_{1}\right| & \leq D_{1}\left(z_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)+\kappa \varepsilon _{{D_{1}}}\right){\partial g_{2} \over \partial x_{3}}z_{3}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)\nonumber\\[4pt] & \leq -\tilde{D}_{3}{\partial g_{2} \over \partial x_{3}}z_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)+D_{3}\kappa \varepsilon _{{D_{3}}} \end{align}

Consequently, the sixth and eighth items of Equation (62) satisfy the following inequality

(64)

\begin{align}& z_{1}\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right)\leq -\tilde{D}_{1}z_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)+D_{1}\varepsilon _{{D_{1}}}\nonumber\\ & {\partial g_{2} \over \partial x_{3}}z_{3}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right)\leq -\tilde{D}_{3}{\partial g_{2} \over \partial x_{3}}z_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)+D_{3}\kappa \varepsilon _{{D_{3}}} \end{align}

Combining Assumption 1 and Equations (24), (48), the seventh and ninth items of Equation (62) satisfy

(65)

\begin{align}\left({\partial g_{1} \over \partial x_{1}}z_{2}+{\partial g_{2} \over \partial x_{1}}z_{3}+{\partial g_{3} \over \partial x_{1}}z_{4}\right)\left(d_{1}-\hat{D}_{1}\tanh \left({z_{1} \over \varepsilon _{{D_{1}}}}\right)\right) & \leq {1 \over 2}\bar{g}^{2}{z}_{2}^{2}+{1 \over 2}\bar{g}^{2}{z}_{3}^{2}+{1 \over 2}\bar{g}^{2}{z}_{4}^{2}+{3 \over 2}{\bar{\varepsilon }}_{D_{1}}^{2}\nonumber\\ {\partial g_{3} \over \partial x_{3}}z_{4}\left(d_{3}-\hat{D}_{3}\tanh \left({\partial g_{2} \over \partial x_{3}}\cdot {z_{3} \over \varepsilon _{{D_{3}}}}\right)\right) & \leq {1 \over 2}\bar{g}^{2}{z}_{4}^{2}+{1 \over 2}{\bar{\varepsilon }}_{D_{3}}^{2} \end{align}

With Equations (64), (65) and the update laws (30), (43), (36), (54), Equation (62) can be rewritten as

(66)

\begin{align}\dot{V} & \leq -k_{1}{z}_{1}^{2}-k_{2}{z}_{2}^{2}-k_{3}{z}_{3}^{2}-k_{4}{z}_{4}^{2}-\sigma _{a}{\tilde{\Theta }}_{a}^{T}\hat{\Theta }_{a}-\sigma _{J}{\tilde{\Theta }}_{J}^{T}\hat{\Theta }_{J}+{1 \over 2}{z}_{1}^{2}+{1 \over 2}{z}_{2}^{2}+{z}_{3}^{2}+{z}_{4}^{2}\nonumber\\[4pt] & -\sigma _{{D_{1}}}\tilde{D}_{1}\hat{D}_{1}+D_{1}\kappa \varepsilon _{{D_{1}}}-\sigma _{{D_{3}}}\tilde{D}_{3}\hat{D}_{3}+D_{3}\kappa \varepsilon _{{D_{3}}}+{1 \over 2}\bar{g}^{2}{z}_{2}^{2}+{1 \over 2}\bar{g}^{2}{z}_{3}^{2}+{1 \over 2}\bar{g}^{2}{z}_{4}^{2}+{1 \over 2}\bar{g}^{2}{z}_{4}^{2}\nonumber\\[4pt] & -\sigma _{1}\tilde{\theta }_{1}\hat{\theta }_{1}-\sigma _{3}\tilde{\theta }_{3}\hat{\theta }_{3}+z_{1}\left(z_{2}+y_{1}\right)+{\partial g_{1} \over \partial x_{2}}z_{2}\left(z_{3}+y_{2}\right)+{\partial g_{2} \over \partial x_{3}}z_{3}\left(z_{4}+y_{3}\right)\nonumber\\[4pt] & -{{y}_{1}^{2} \over \tau _{1}}-\dot{g}_{1c}y_{1}-{{y}_{2}^{2} \over \tau _{2}}-\dot{g}_{2c}y_{2}-{{y}_{3}^{2} \over \tau _{3}}-\dot{g}_{3c}y_{3}+2{\varepsilon }_{m1}^{2}+{\varepsilon }_{m3}^{2}+{3 \over 2}{\bar{\varepsilon }}_{D_{1}}^{2}+{1 \over 2}{\bar{\varepsilon }}_{D_{3}}^{2}\\[4pt] & -\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial _{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right)\varepsilon _{a}\nonumber\\[4pt] & -{\partial g_{1} \over \partial x_{2}}\left({\partial g_{1} \over \partial x_{2}}{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\,\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)-\varepsilon _{\textit{critic}}{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right) \nonumber \end{align}

In view of inequality $2ab\leqslant a^{2} + b^{2}$ and Assumption 1 we have

(67)

\begin{align}& z_{1}\left(z_{2}+y_{1}\right)+{\partial g_{1} \over \partial x_{2}}z_{2}\left(z_{3}+y_{2}\right)+{\partial g_{2} \over \partial x_{3}}z_{3}\left(z_{4}+{\rm y}_{3}\right)\nonumber\\[4pt] & \leq {z}_{1}^{2}+\left({1 \over 2}+\bar{g}^{2}\right){z}_{2}^{2}+\left({1 \over 2}+\bar{g}^{2}\right){z}_{3}^{2}+{1 \over 2}{z}_{4}^{2}+{1 \over 2}{y}_{1}^{2}+{1 \over 2}{y}_{2}^{2}+{1 \over 2}{y}_{3}^{2} \end{align}

According to Lemma 3, we can get that

(68)

\begin{align}-\dot{g}_{1c}y_{1}\leq {{y}_{1}^{2} \over 2\tau _{1}}+{\tau _{1}\left| \dot{g}_{1c}\right| ^{2} \over 2}\nonumber\\[4pt] -\dot{g}_{2c}y_{2}\leq {{y}_{2}^{2} \over 2\tau _{2}}+{\tau _{2}\left| \dot{g}_{2c}\right| ^{2} \over 2}\\[4pt] -\dot{g}_{3c}y_{3}\leq {{y}_{3}^{2} \over 2\tau _{3}}+{\tau _{3}\left| \dot{g}_{3c}\right| ^{2} \over 2}\nonumber \end{align}

Substituting Equations (67), (68) into Equation (66) yields that

(69)

\begin{align}\dot{V} & \leq -\left(k_{1}-{3 \over 2}\right){z}_{1}^{2}-\left(k_{2}-1-{3 \over 2}\bar{g}^{2}\right){z}_{2}^{2}-\left(k_{3}-{3 \over 2}-{3 \over 2}\bar{g}^{2}\right){z}_{3}^{2}-\left(k_{4}-{3 \over 2}-\bar{g}^{2}\right){z}_{4}^{2}\nonumber\\[3pt] & -\sum\limits_{i=1}^{3}\,\left({1 \over 2\tau _{i}}-{1 \over 2}\right){y}_{i}^{2}-{\partial g_{1} \over \partial x_{2}}\left({\partial g_{1} \over \partial _{2}}{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\,\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)-\varepsilon _{\textit{critic}}{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\nonumber\\[3pt] & -\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right)\varepsilon _{a}\nonumber\\[3pt] & -\sigma _{a}{\tilde{\Theta }}_{a}^{T}\hat{\Theta }_{a}-\sigma _{J}{\tilde{\Theta }}_{J}^{T}\hat{\Theta }_{J}-\sigma _{{D_{1}}}\tilde{D}_{1}\hat{D}_{1}-\sigma _{{D_{3}}}\tilde{D}_{3}\hat{D}_{3}-\sigma _{1}\tilde{\theta }_{1}\hat{\theta }_{1}-\sigma _{3}\tilde{\theta }_{3}\hat{\theta }_{3}\\[3pt] & +D_{1}\kappa \varepsilon _{{D_{1}}}+D_{3}\kappa \varepsilon _{{D_{3}}}+{3 \over 2}{\bar{\varepsilon }}_{D_{1}}^{2}+{1 \over 2}{\bar{\varepsilon }}_{D_{3}}^{2}+{\sum }_{i=1}^{3}\,{\tau _{i} \over 2}\left| \dot{g}_{ic}\right| ^{2}+2{\varepsilon }_{m1}^{2}+{\varepsilon }_{m3}^{2} \nonumber \end{align}

Based on Assumption 2, we can know that ||Ψ _a|| ≤ $\psi$ _a, ||Ψ _J|| ≤ $\psi$ _J. The sixth term of Equation (69) can be calculated that

(70)

\begin{align}& -{\partial g_{1} \over \partial x_{2}}\left({\partial g_{1} \over \partial x_{2}}{\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\hat{\!J}\,\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\nonumber\\[3pt] & =-\left({\partial g_{1} \over \partial x_{2}}\right)^{2}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right){\hat{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)-{\partial g_{1} \over \partial x_{2}}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right){\hat{\Theta }}_{J}^{T}\Psi _{J}\left(Z_{c}\right)\nonumber\\[3pt] & =-\left({\partial g_{1} \over \partial x_{2}}\right)^{2}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right){\Theta }_{a}^{T}\Psi _{a}\left(Z_{a}\right)-\left({\partial g_{1} \over \partial x_{2}}\right)^{2}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\nonumber\\[3pt] & -{\partial g_{1} \over \partial x_{2}}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right){\Theta }_{J}^{T}\Psi _{J}\left(z_{2}\right)-{\partial g_{1} \over \partial x_{2}}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\tilde{\Theta }^{T}\Psi _{J}\left(Z_{c}\right)\\[3pt] & \leq {1 \over 2}\bar{g}^{2}\left({\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right)^{2}+{1 \over 2}\bar{g}^{2}\left({\Theta }_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right)^{2}+\bar{g}\left({\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right)^{2}\nonumber\\[3pt] & +{1 \over 2}\bar{g}\left({\Theta }_{J}^{T}\Psi _{J}\left(_{c}\right)\right)^{2}+{1 \over 2}\bar{g}\left({\tilde{\Theta }}_{J}^{T}\Psi _{J}\left(Z_{c}\right)\right)^{2}\nonumber\\[3pt] & \leq {1 \over 2}\bar{g}^{2}{\psi }_{a}^{2}\left|\left|\tilde{\Theta }_{a}\right|\right|^{2}+{1 \over 2}\bar{g}^{2}{\psi }_{a}^{2}\left|\left|\Theta _{a}\right|\right|^{2}+\bar{g}{\psi }_{a}^{2}\left|\left|\tilde{\Theta }_{a}\right|\right|^{2}+{1 \over 2}\bar{g}{\psi }_{J}^{2}\left|\left|\Theta _{J}\right|\right|^{2}+{1 \over 2}\bar{g}{\psi }_{J}^{2}\left|\left|\tilde{\Theta }_{J}\right|\right|^{2} \nonumber \end{align}

Based on Assumption 2, we can know that || $\dot \Psi$ _J|| ≤ γ _J, | $\dot{\varepsilon}$ _J| ≤ ξ _J. The seventh term of Equation (69) holds

(71)

\begin{align}& -\varepsilon _{\textit{critic}}{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)=-\left({\Theta }_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)+\dot{\varepsilon }_{J}+{\hat{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\right){\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\nonumber\\[3pt] & =-{\Theta }_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right){\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)-\dot{\varepsilon }_{J}{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)-\left({\Theta }_{J}^{T}+{\tilde{\Theta }}_{J}^{T}\right)\dot{\Psi }_{J}\left(Z_{c}\right){\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(_{c}\right)\nonumber\\[3pt] & =-2{\Theta }_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right){\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)-\dot{\varepsilon }_{J}{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)-{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right){\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\\[3pt] & \leq \left|\left|-{\Theta }_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\right|\right|^{2}+\left|\left|{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\right|\right|^{2}-\left|\left|{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\right|\right|^{2}+{1 \over 2}\left|\left|{\tilde{\Theta }}_{J}^{T}\dot{\Psi }_{J}\left(Z_{c}\right)\right|\right|^{2}+{1 \over 2}\left(-\dot{\varepsilon }_{J}\right)^{2}\nonumber\\[3pt] & \leq {1 \over 2}{\gamma }_{J}^{2}\left|\left|\tilde{\Theta }_{J}\right|\right|^{2}+{\gamma }_{J}^{2}\left|\left|\Theta _{J}\right|\right|^{2}+{1 \over 2}{\xi }_{J}^{2} \nonumber \end{align}

Based on Assumption 2, we can know that | $\varepsilon$ _a| ≤ ς_a. The eighth and ninth items of Equation (69) can be formulated that

(72)

\begin{align}& -\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right){\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)+\left({\partial g_{1} \over \partial x_{2}}z_{2}+{\partial g_{2} \over \partial x_{2}}z_{3}+{\partial g_{3} \over \partial x_{2}}z_{4}\right)\varepsilon _{a}\nonumber\\[3pt] & \leq \bar{g}\left(\left| z_{2}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right| +\left| z_{3}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right| +\left| z_{4}{\tilde{\Theta }}_{a}^{T}\Psi _{a}\left(Z_{a}\right)\right| +\left(\left| z_{2}\right| +\left| z_{3}\right| +\left| z_{4}\right| \right)\cdot \left| \varepsilon _{a}\right| \right)\nonumber\\[3pt] & \leq \bar{g}\left({z}_{2}^{2}+{z}_{3}^{2}+{z}_{4}^{2}+{3 \over 2}{\psi }_{a}^{2}\left|\left|\tilde{\Theta }_{a}\right|\right|^{2}+{3 \over 2}{\varsigma }_{a}^{2}\right) \end{align}

Considering the following inequality

(73)

\begin{align}-\sigma _{a}{\tilde{\Theta }}_{a}^{T}\hat{\Theta }_{a} & \leq -{\sigma _{a} \over 2}\left|\left|\tilde{\Theta }_{a}\right|\right|^{2}+{\sigma _{a} \over 2}\left|\left|\Theta _{a}\right|\right|^{2}\nonumber\\[3pt] -\sigma _{J}{\tilde{\Theta }}_{J}^{T}\hat{\Theta }_{J} & \leq -{\sigma _{J} \over 2}\left|\left|\tilde{\Theta }_{J}\right|\right|^{2}+{\sigma _{J} \over 2}\left|\left|\Theta _{J}\right|\right|^{2}\nonumber\\[3pt] -\sigma _{{D_{1}}}\tilde{D}_{1}\hat{D}_{1} & \leq -{1 \over 2}\sigma _{{D_{1}}}{\tilde{D}}_{1}^{2}+{1 \over 2}\sigma _{{D_{1}}}{D}_{1}^{2}\nonumber\\[3pt] -\sigma _{{D_{3}}}\tilde{D}_{3}\hat{D}_{3} & \leq -{1 \over 2}\sigma _{{D_{3}}}{\tilde{D}}_{3}^{2}+{1 \over 2}\sigma _{{D_{3}}}{D}_{3}^{2}\\[3pt] -\sigma _{1}\tilde{\theta }_{1}\hat{\theta }_{1} & \leq -{1 \over 2}\sigma _{1}{\tilde{\theta }}_{1}^{2}+{1 \over 2}\sigma _{1}{\theta }_{1}^{2}\nonumber\\[3pt] -\sigma _{3}\tilde{\theta }_{3}\hat{\theta }_{3} & \leq -{1 \over 2}\sigma _{3}{\tilde{\theta }}_{3}^{2}+{1 \over 2}\sigma _{3}{\theta }_{3}^{2}\nonumber \end{align}

Combining Equations (70)–(73), we can finally draw the following conclusion

(74)

\begin{align}\dot{V} & \leq -\left(k_{1}-{3 \over 2}\right){z}_{1}^{2}-\left(k_{2}-1-\bar{g}-{3 \over 2}\bar{g}^{2}\right){z}_{2}^{2}-\left(k_{3}-{3 \over 2}-\bar{g}-{3 \over 2}\bar{g}^{2}\right){z}_{3}^{2}-\left(k_{4}-{3 \over 2}-\bar{g}-\bar{g}^{2}\right){z}_{4}^{2}\nonumber\\[3pt] & -{\sum }_{i=1}^{3}\,\left({1 \over 2\tau _{i}}-{1 \over 2}\right){y}_{i}^{2}-{1 \over 2}\left(\sigma _{a}-5\bar{g}{\psi }_{a}^{2}-\bar{g}^{2}{\psi }_{a}^{2}\right)\left|\left|\tilde{\Theta }_{a}\right|\right|^{2}-{1 \over 2}\left(\sigma _{J}-{\gamma }_{J}^{2}-\bar{g}{\psi }_{J}^{2}\right)\left|\left|\tilde{\Theta }\right|\right|^{2}\nonumber\\[3pt] & -\sum _{i=1,3}\,{1 \over 2}\sigma _{{D_{i}}}{\tilde{D}}_{i}^{2}-{\sum }_{i=1}^{3}\,{1 \over 2}\sigma _{i}{\tilde{W}}_{i}^{T}\tilde{W}_{i}\\[3pt] & \leq -C_{V}V+{\rm E}_{V}\nonumber \end{align}

where

(75)

\begin{align}C_{V} & =\min \left\{\begin{array}{ll} & 2\left(k_{1}-{3 \over 2}\right),2\left(k_{2}-1-\bar{g}-{3 \over 2}\bar{g}^{2}\right),2\left(k_{3}-{3 \over 2}-\bar{g}-{3 \over 2}\bar{g}^{2}\right),2\left(k_{4}-{3 \over 2}-\bar{g}-\bar{g}^{2}\right),\\[5pt] & \left({1 \over \tau _{1}}-1\right),\left({1 \over \tau _{2}}-1\right),\left({1 \over \tau _{3}}-1\right),\eta _{a}\left(\sigma _{a}-5\bar{g}{\psi }_{a}^{2}-\bar{g}^{2}{\psi }_{a}^{2}\right),\eta _{J}\left(\sigma _{J}-{\gamma }_{J}^{2}-\bar{g}{\psi }_{J}^{2}\right),\\[5pt] & \eta _{{D_{1}}}\sigma _{{D_{1}}},\eta _{{D_{3}}}\sigma _{{D_{3}}},\eta _{1}\sigma _{1},\eta _{3}\sigma _{3} \end{array}\right\}\nonumber\\[4pt] E_{V} & =D_{1}\kappa \varepsilon _{{D_{1}}}+D_{3}\kappa \varepsilon _{{D_{3}}}+{3 \over 2}{\bar{\varepsilon }}_{D_{1}}^{2}+{1 \over 2}{\bar{\varepsilon }}_{D_{3}}^{2}+\sum _{i=1,3}\,{1 \over 2}\sigma _{{D_{i}}}{D}_{i}^{2}\nonumber\\[4pt] & +\left({\sigma _{a} \over 2}+{1 \over 2}\bar{g}^{2}{\psi }_{a}^{2}\right)\left|\left|\Theta _{a}\right|\right|^{2}+{3 \over 2}\bar{g}{\varsigma }_{a}^{2}+\left({\sigma _{J} \over 2}+{1 \over 2}\bar{g}{\psi }_{J}^{2}+{\gamma }_{J}^{2}\right)\left|\left|\Theta _{J}\right|\right|^{2}+{1 \over 2}{\xi }_{J}^{2}\nonumber\\[4pt] & +\sum\limits_{i=1}^{3}\,{\tau _{i} \over 2}\left| \dot{g}_{ic}\right| ^{2}+\sum _{i=1,3}\,{1 \over 2}\sigma _{i}{\theta }_{i}^{2}+2{\varepsilon }_{m1}^{2}+{\varepsilon }_{m3}^{2} \end{align}

According to Lemma 4, V(t) is bounded. Hence, the parameters in V(t) are bounded. Furthermore, the control signals are convergent and bounded so that we can draw the conclusion that the NAIGC system is stable. The proof is completed.

Table 1. Target location in different situations

Table 2. The actuator failures in different situations

Table 3. The target accelerations in different situations

4.0 Simulation

In this paper, the validity and effectiveness of the proposed method are verified by numerical simulations. The effectiveness of the proposed method is verified by designing simulations considering the time-varying maneuver acceleration of the target and the time-varying actuator failure of the missile flight control. The robustness of the proposed method is also verified by comparing it with actor-critic without reinforcement learning.

Figure 3. Simulation profiles for Case 1. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration n _L of missile. (e) The pitch rate q of missile. (f) The actuator output δ _e. (g) The estimate of the upper bound of the interference norm D ₁. (h) The estimate of the upper bound of the interference norm D ₃. (i) The paradigm of the parameter θ ₁ estimate. (j) The paradigm of the parameter θ ₃ estimate. (k) The actor network weight norm. (l) The critic network weight norm.

Figure 4. Simulation profiles for Case 2. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration n _L of missile. (e) The pitch rate q of missile. (f) The actuator output δ _e. (g) The estimate of the upper bound of the interference norm D ₁. (h) The estimate of the upper bound of the interference norm D ₃. (i) The estimate of the parameter θ ₁. (j) The estimate of the parameter θ ₃. (k) The actor network weight norm. (l) The critic network weight norm.

Figure 5. Simulation profiles for Case 3. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration n_L of missile. (e) The pitch rate q of missile. (f) The actuator output δ _e. (g) The estimate of the upper bound of the interference norm D ₁. (h) The estimate of the upper bound of the interference norm D ₃. (i) The estimate of the parameter θ ₁. (j) The estimate of the parameter θ ₃. (k) The actor network weight norm. (l) The critic network weight norm.

The initial conditions of the missile kinematic eqnarrays and the initial velocity of the target are given in Ref. [Reference Wang and Yuan10]. The aerodynamic and body parameters of the missile are given in Ref. [Reference Wang and Yuan10] and the angle of the elevator is limited to [−30^∘,30^∘]. The initial position of the missile is set as follows: x(0) = 0m, y(0) = 0m. The flight path angle of the target is initialised γ _T(0) = 0. The initial conditions for the actuator fault output are as follows: λ _δ(0) = 1, u(0) = 0, d _δ(0) = 0. The initial values for the adaptive parameters and neural network related parameters in the control step are as follows:

\begin{align*} \hat{D}_{1} (0) = 10, \hat{D}_3 (0) = 0, \hat{\theta}_1 (0) = \hat{\theta}_2(0)=\hat{\theta}_3(0)=0\end{align*}

\begin{align*}\hat{\Theta }_{a}\left(0\right) =\left[0.2,0.6,-0.3,0.1,-0.5,0.8,0.15,-0.23,0.35\right]\end{align*}

\begin{align*}\hat{\Theta }_{J}\left(0\right) =\left[0.3,0.1,-0.7,0.5,-0.64,0.28,0.85,-0.23,0.35,-0.11,-0.92\right] \end{align*}

Different initial positions of the targets under different conditions, different actuator failures, different time-varying target maneuver acceleration A _Tr and A _Tλ. Specific parameters are shown in Tables 1, 2 and 3, respectively. The control gain is set to k ₁ = 5, k ₂ = 20, k ₃ = 40, k ₄ = 150; The parameters of the bounders of the tanh function are chosen as $\varepsilon$ _D
₁ = 500, $\varepsilon$ _D
₃ = 100; The parameters of the actor-critic network weight gradient descent update law are chosen as η _a = 0.1, σ _a = 50, η _J = 0.1, σ _J = 50. The parameters of the adaptive update law are chosen as η _D
₁ = 0.01, σ _D
₁ = 20, η _D
₃ = 0.001, σ _D
₃ = 1e4, η ₁ = 0.001, σ ₁ = 50, η ₃ = 5e − 7, σ ₃ = 10; The parameters of the filter are chosen as τ ₁ = 0.2, τ ₂ = τ ₃ = 0.1, The other parameters are c ₀ = 0.1, $\varepsilon$ ₁₁ = $\varepsilon$ ₁₂ = $\varepsilon$ ₁₃ = $\varepsilon$ ₃₃ = $\varepsilon$ ₁₄ = $\varepsilon$ ₃₄ = 1.

The simulation results for Case 1, Case 2 and Case 3 are shown in Figs 3, 4 and 5, respectively. The horizontal coordinates of the missile and the target in the two-dimensional plane are shown in Figs 3(a), 4(a) and 5(a). The vertical coordinates of the missile and the target in the two-dimensional plane are shown in Figs 3(b), 4(b) and 5(b). Figures 3(c), 4(c) and 5(c) show the system output y (guide variable χ). Figures 3(d), 4(d) and 5(d) display the normal acceleration n _L of the missile. Moreover, Figs 3(e), 4(e) and 5(e) show the pitch rate q of missile. The actuator outputs with validity faults and deviation faults are depicted in Figs 3(f), 4(f) and 5(f). The estimate of the upper bound of the interference norm D ₁, D ₃ are given in Figs 3(g), (h), 4(g), (h), 5(g) and (h). The paradigms of the estimate of the parameter θ ₁ and θ ₃ are shown in Figs 3(i), (j), 4(i), (j), 5(i) and (j). The weight paradigms for actor-critic neural network are depicted in Figs 3(k), (l), 4(k), (l), 5(k) and (l). In conclusion, the results show that the interception strategy χ converges to zero and maintains accurate hit-kill interception even when the target possesses time-varying acceleration and the missile has an actuator fault. As a result, the effectiveness of the proposed method can be verified.

Figure 6. The guide variable χ.

Figure 7. Actuator output δ _e.

At the same time, we compare the proposed control method of this paper with the backstepping fault-tolerant non-affine integrated guidance and control (BS-FTNAIGC) method, the adaptive boundary estimation fault-tolerant non-affine integrated guidance and control (ABE-FTNAIGC) method, and the radial basis function fault-tolerant non-affine integrated guidance and control (RBF-FTNAIGC) method. The initial conditions are as follows : the relative distance between projectile and target along line of sight (LOS) r(0) = 10, 816.654m, the angle between LOS and the horizontal reference line λ = 0.588rad, the target location x _T = 9, 000, y _T = 6, 000, the target acceleration A _Tr = 5sin (45*t), A _Tλ = 10 + 4sin (7.8/0.022*t). The actuator faults happen at t > 2s with λ _δ = 0.8, d _δ = 0.05 and at t > 4s with λ _δ = 0.6, d _δ = 0.12. Other initial conditions are the same as above. The specific comparison is as follows: Fig. 6 displays the system output in the four methods of the NAIGC system. The actuator fault outputs in the four methods of the NAIGC system are shown as Fig. 7. According to the simulation results, it can be seen that the output of the controller and the control target of this paper are finally stable and convergent. At the same time, the overshoot and oscillation of the control method in this paper are smaller. Compared with other methods, the control method proposed in this paper can make the variables more quickly stable, and the steady-state error after stability is smaller. It can be seen that the actor-critic RBFNN has great advantages in the application of non-affine IGC systems.

5.0 Conclusion

In this paper, the NAIGC system is established for a class of STT missile with actuator failures, target acceleration variations and coupling multi-source uncertainties. The non-affine problem of the established model can be solved by the newly introduced integral expansion system. By introducing hyperbolic tangent function, RBFNN and reinforcement learning actor-critic neural network architecture, different adaptive laws and gradient descent update laws are constructed, which can reduce the deviation of actuator fault and target acceleration change, and effectively compensate the influence of multi-source uncertainty. Therefore, this paper not only designs a non-affine IGC model which is more suitable for practical application, but also pro-poses a new control method to apply reinforcement learning actor-critic to NAIGC, which can achieve accurate hit guidance. Finally, the effectiveness and superiority of the proposed method are verified by numerical simulation. In the future, we will study 3D IGC and add constraints.

Acknowledgements

This work was supported in part by the Foundation of China National Key Laboratory of Science and Technology on Test Physics & Numerical Mathematics (Grant Number: JP2022-800006000107-237), the Foundation of China National Key Laboratory of Science and Technology on Test Physics & Numerical Mathematics (Grant Number: 08-YY-2023-R11) and the National Natural Science Foundation of China (Grant Number: 62303378). Meanwhile, it was also supported by the Foundation of Shanghai Astronautics Science and Technology Innovation (Grant Number: SAST2022-114).

Competing interests

The authors declare none.

References

Guo, Z., Wang, J., Hu, G. and Guo, J. Research review on uncertainty observation techniques and control methods for aerospace vehicles, Aerospace Technol., 2022, (5), pp 31–44. doi: 10.16338/j.issn.2097-0714.20220091 Google Scholar

Wu, Y., Lu, X. and Wang, Z. Research on integrated design of aircraft spiral maneuver, guidance and control based on sliding mode control, Beijing Ligong Daxue Xuebao/Trans. Beijing Inst. Technol., 2022, 42, (5), pp 523–529. doi: 10.15918/j.tbit1001-0645.2021.089 Google Scholar

Wang, X., Zhang, X., Lin, P. and Li, W. Integrated strategy of penetration and attack based on optimal control, Flight Dyn., 2022, 40, (6), 51–60+71. doi: 10.13645/j.cnki.f.d.20220716.001 Google Scholar

Xu, M., Chen, G. and Wang, W. Aero-control integrated design for reusable launch vehicle based on feedback linearization, Meas. Control Technol., 2018, 37, (9), pp 88–91. doi: 10.19708/j.ckjs.2018.09.021 Google Scholar

Hu, C., Wei, Y. and Wang, X. Fixed-time integrated guidance and control for impact angle constrained interception with multiple uncertainties, J. Projectiles Rockets Missiles Guidance, 2023, 43, (4), pp 98–104. doi: 10.15892/j.cnki.djzdxb.2023.04.015 Google Scholar

Jiang, S., qing Tian, F., yan Sun, S. and ge Liang, W. Integrated guidance and control of guided projectile with multiple constraints based on fuzzy adaptive and dynamic surface, Def. Technol., 2020 16, (6), pp 1130–1141. doi: 10.1016/j.dt.2019.12.003 CrossRef Google Scholar

Zhao, K., Cao, D.Q. and Huang, W.H. Integrated guidance and control design for reentry warhead based on adrc, Yuhang Xuebao/J. Astronaut., 2017, 38, (10), pp 1068–1078. doi: 10.3873/j.issn.1000-1328.2017.10.007 Google Scholar

Wang, Z., Yuan, J., Pan, Y. and Wei, J. Neural network-based adaptive fault tolerant consensus control for a class of high order multiagent systems with input quantization and time-varying parameters, Neurocomputing, 2017, 266, pp 315–324. https://doi.org/10.1016/j.neucom.2017.05.043 CrossRef Google Scholar

Wang, Z., Zhang, B. and Yuan, J. Decentralized adaptive fault tolerant control for a class of interconnected systems with nonlinear multisource disturbances, J. Franklin Inst., 2018, 355, (11), pp 4493–4514. doi: 10.1016/j.jfranklin.2017.10.038 CrossRef Google Scholar

Wang, Z. and Yuan, J. Fuzzy adaptive fault tolerant igc method for stt missiles with time-varying actuator faults and multisource uncertainties, J. Franklin Inst., 2020, 357, (1), pp 59–81. doi: 10.1016/j.jfranklin.2019.09.032 CrossRef Google Scholar

Najafi, A., Vu, M.T., Mobayen, S., Asad, J.H. and Fekih, A. Adaptive barrier fast terminal sliding mode actuator fault tolerant control approach for quadrotor uavs, Mathematics, 2022, 10, (16), pp 1–22. doi: 10.3390/math10163009 CrossRef Google Scholar

Zhao, L., Zhao, F. and Che, W.W. Distributed adaptive fuzzy fault-tolerant control for multi-agent systems with node faults and denial-of-service attacks, Inf. Sci., 2023, 631, pp 385–395. doi: 10.1016/j.ins.2023.02.059 CrossRef Google Scholar

Ashrafifar, A. and Jegarkandi, M.F. Adaptive fin failures tolerant integrated guidance and control based on backstepping sliding mode, Trans. Inst. Meas. Control, 2020, 42, (10), pp 1823–1833. doi: 10.1177/0142331219897430 CrossRef Google Scholar

Zhao, D., Research on Integrated Guidance and Control Design of Hypersonic Flight Vehicle, Master’s Thesis, Beijing Jiaotong University, No.3 Shangyuan Village, Haidian District, Beijing, China 2020.Google Scholar

Chen, K. Full state constrained stochastic adaptive integrated guidance and control for stt missiles with non-affine aerodynamic characteristics, Inf. Sci., 2020, 529, pp 42–58. doi: 10.1016/j.ins.2020.03.061 CrossRef Google Scholar

Liu, Y.J., Li, S., Tong, S. and Chen, C.L., Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input, IEEE Trans. Neural Networks Learn. Syst., 2019, 30, (1), pp 295–305. doi: 10.1109/TNNLS.2018.2844165 CrossRef Google Scholar PubMed

Lopez, V.G. and Lewis, F.L. Dynamic multiobjective control for continuous-time systems using reinforcement learning, IEEE Trans. Autom. Control, 2019, 64, (7), pp 2869–2874. doi: 10.1109/TAC.2018.2869462 CrossRef Google Scholar

Ruelens, F., Claessens, B.J., Quaiyum, S., De Schutter, B., Babuška, R. and Belmans, R. Reinforcement learning applied to an electric water heater: From theory to practice, IEEE Trans. Smart Grid, 2018, 9, (4), pp 3792–3800. doi: 10.1109/TSG.2016.2640184.CrossRef Google Scholar

Xue, L., Sun, C., Wunsch, D., Zhou, Y. and Yu, F. An adaptive strategy via reinforcement learning for the prisoner’s dilemma game, IEEE/CAA J. Autom. Sinica, 2018, 5, (1), pp 301–310. doi: 10.1109/JAS.2017.7510466 CrossRef Google Scholar

Peng, Z., Hu, J., Shi, K., Luo, R., Huang, R., Ghosh, B.K. and Huang, J. A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning, Appl. Math. Comput., 2020, 369, p 124821. doi: 10.1016/j.amc.2019.124821 Google Scholar

Yang, Y., Modares, H., Wunsch, D.C. and Yin, Y. Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning, IEEE Trans. Neural Networks and Learn. Syst., 2018, 29, (6), pp 2139–2153. doi: 10.1109/TNNLS.2018.2803059 CrossRef Google Scholar PubMed

Fan, Q.Y., Yang, G.H. and Ye, D. Quantization-based adaptive actor-critic tracking control with tracking error constraints, IEEE Trans. Neural Networks Learn. Syst., 2018, 29, (4), pp 970–980. doi: 10.1109/TNNLS.2017.2651104 CrossRef Google Scholar PubMed

Hu, L., Li, R., Xue, T. and Liu, Y. Neuro-adaptive tracking control of a hypersonic flight vehicle with uncertainties using reinforcement synthesis, Neurocomputing, 2018, 285, pp 141–153. doi: 10.1016/j.neucom.2018.01.031 CrossRef Google Scholar

Ouyang, Y., Dong, L., Wei, Y. and Sun, C. Neural network based tracking control for an elastic joint robot with input constraint via actor-critic design, Neurocomputing, 2020, 409, pp 286–295. doi: 10.1016/j.neucom.2020.05.067 CrossRef Google Scholar

Liu, J., Shan, J., Rong, J. and Zheng, X. Incremental reinforcement learning flight control with adaptive learning rate, J. Astronaut., 2022, 43, (1), pp 111–121. doi: 10.3873/j.issn.1000-1328.2022.01.013 Google Scholar

Bohao, L., Xuman, A., Xiaofei, Y., Yunjie, W. and Guofei, L. A distributed reinforcement learning guidance method under impact angle constraints, J. Astronaut., 2022, 43, (8), pp 1061–1069. doi: 10.3873/j.issn.1000-1328.2022.08.008 Google Scholar

Pei, P., Shao-ming, H., Jiang, W. and De-fu, L. Integrated guidance and control for missile using deep reinforcement learning, J. Astronaut., 2021, 42, (10), pp 1293–1304. doi: 10.3873/j.issn.1000-1328.2021.10.010 Google Scholar

Song, J., Luo, Y., Zhao, M., Hu, Y. and Zhang, Y. Fault-tolerant integrated guidance and control design for hypersonic vehicle based on ppo, Mathematics, 2022, 10, (18), pp 1–13. doi: 10.3390/math10183401 CrossRef Google Scholar

Wang, W., Xiong, S., Wang, S., Song, S. and Lai, C. Three dimensional impact angle constrained integrated guidance and control for missiles with input saturation and actuator failure, Aerospace Sci. Technol., 2016, 53, pp 169–187. doi: 10.1016/j.ast.2016.03.015 CrossRef Google Scholar

Wang, Z. and Yuan, J. Full state constrained adaptive fuzzy control for stochastic nonlinear switched systems with input quantization, IEEE Trans. Fuzzy Syst., 2020, 28, (4), pp 645–657. doi: 10.1109/TFUZZ.2019.2912150 CrossRef Google Scholar

Huang, J. Research on Command Filter Based Adaptive Control Algorithm For Nonlinear Systems with Full State Constraints, Master’s Thesis, Yangzhou University, No. 88, University South Road, Yangzhou City, Jiangsu Province, China, 2023.Google Scholar

Xia, J., Lian, Y., Su, S.-F., Shen, H. and Chen, G. Observer-based event-triggered adaptive fuzzy control for unmeasured stochastic nonlinear systems with unknown control directions, IEEE Trans. Cybern., 2022, 52, (10), pp 10655–10666. doi: 10.1109/TCYB.2021.3069853 CrossRef Google Scholar PubMed

He, W. and Dong, Y. Adaptive fuzzy neural network control for a constrained robot using impedance learning, IEEE Trans. Neural Networks Learn. Syst., 2018, 29, (4), pp 1174–1186. doi: 10.1109/TNNLS.2017.2665581 CrossRef Google Scholar PubMed

Wang, Z., Yuan, J., Pan, Y. and Che, D. Adaptive neural control for high order markovian jump nonlinear systems with unmodeled dynamics and dead zone inputs, Neurocomputing, 2017, 247, pp 62–72. doi: 10.1016/j.neucom.2017.03.041 CrossRef Google Scholar

Yu, D., Long, J., Chen, C.L.P. and Wang, Z. Adaptive swarm control within saturated input based on nonlinear coupling degree, IEEE Trans. Syst. Man Cybern. Syst., 2022, 52, (8), pp 4900–4911. doi: 10.1109/TSMC.2021.3102587 CrossRef Google Scholar

Figure 1. The relationship between the outer and inner loops of the guidance and control integration.

Figure 2. Integrated guidance and control design block diagram.

Table 1. Target location in different situations

Table 2. The actuator failures in different situations

Table 3. The target accelerations in different situations

Figure 3. Simulation profiles for Case 1. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration nL of missile. (e) The pitch rate q of missile. (f) The actuator output δe. (g) The estimate of the upper bound of the interference norm D1. (h) The estimate of the upper bound of the interference norm D3. (i) The paradigm of the parameter θ1 estimate. (j) The paradigm of the parameter θ3 estimate. (k) The actor network weight norm. (l) The critic network weight norm.

Figure 4. Simulation profiles for Case 2. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration nL of missile. (e) The pitch rate q of missile. (f) The actuator output δe. (g) The estimate of the upper bound of the interference norm D1. (h) The estimate of the upper bound of the interference norm D3. (i) The estimate of the parameter θ1. (j) The estimate of the parameter θ3. (k) The actor network weight norm. (l) The critic network weight norm.

Figure 5. Simulation profiles for Case 3. (a) The x coordinates of missile and target. (b) The y coordinates of missile and target. (c) The guide variable χ. (d) The normal acceleration nL of missile. (e) The pitch rate q of missile. (f) The actuator output δe. (g) The estimate of the upper bound of the interference norm D1. (h) The estimate of the upper bound of the interference norm D3. (i) The estimate of the parameter θ1. (j) The estimate of the parameter θ3. (k) The actor network weight norm. (l) The critic network weight norm.

Figure 6. The guide variable χ.

Figure 7. Actuator output δe.

Reinforcement learning-adaptive fault-tolerant IGC method for a class of aircraft with non-affine characteristics and multiple uncertainties – CORRIGENDUM

Z. Wang , Y. T. Hao , J. L. Liu , Y. F. Bai and D. X. Yu

The Aeronautical Journal

Article contents

Reinforcement learning-adaptive fault-tolerant IGC method for a class of aircraft with non-affine characteristics and multiple uncertainties

Abstract

Keywords

Nomenclature

Greek symbol

1.0 Introduction

2.0 Problem formulation and preliminaries

2.1 Non-affine IGC model

2.2 Neural networks for approximation

3.0 Main results

3.1 Design steps of the reinforcement learning adaptive fault-tolerant IGC method

3.1.1 Step 1

3.1.2 Step 2

3.1.3 Step 3

3.1.4 Step 4

3.2 Analysis of stability

4.0 Simulation

5.0 Conclusion

Acknowledgements

Competing interests

References

A correction has been issued for this article:

Linked content

Article contents

Reinforcement learning-adaptive fault-tolerant IGC method for a class of aircraft with non-affine characteristics and multiple uncertainties

Abstract

Keywords

Nomenclature

Greek symbol

1.0 Introduction

2.0 Problem formulation and preliminaries

2.1 Non-affine IGC model

2.2 Neural networks for approximation

3.0 Main results

3.1 Design steps of the reinforcement learning adaptive fault-tolerant IGC method

3.1.1 Step 1

3.1.2 Step 2

3.1.3 Step 3

3.1.4 Step 4

3.2 Analysis of stability

4.0 Simulation

5.0 Conclusion

Acknowledgements

Competing interests

References

A correction has been issued for this article:

Linked content

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests