Nomenclature
- LOS
-
line of sight
- V
-
velocity
- q
-
pitch rate
- ${n_L}$
-
normal accelerationl reference length
- S
-
reference area
- m
-
mass of the missile
- ${I_{yy}}$
-
moment of inertia around the pitch axis
- ${T_\alpha }$
-
turning rate time constantr range along the LOS
- ${V_r}$
-
projections of relative velocities along to the LOS
- ${V_\lambda }$
-
projections of relative velocities orthogonal to the LOS
- ${A_{Tr}}$
-
projections of target acceleration along LOS
- ${A_{T\lambda }}$
-
projections of target acceleration orthogonal to LOS
- ${x_M}$
-
x-coordinate of the missile
- ${y_M}$
-
y-coordinate of the missile
- ${z_M}$
-
z-coordinate of the missile
- ${x_T}$
-
x-coordinate of the target
- ${y_T}$
-
y-coordinate of the target
- ${z_T}$
-
z-coordinate of the target
- STT
-
skid-to-turn
- IGC
-
integrated guidance and control
- FTIGC
-
fault-tolerant integrated guidance and control
- NAIGC
-
non-affine integrated guidance and control
- FTC
-
fault-tolerant control
- RL
-
Reinforcement learning
- RBFNN
-
radial basis function neural network
- BS-FTNAIGC
-
backstepping fault-tolerant non-affine integrated guidance and control
- ABE-FTNAIGC
-
adaptive boundary estimation fault-tolerant non-affine integrated guidance and control
- RBF-FTNAIGC
-
radial basis function fault-tolerant non-affine integrated guidance and control
Greek symbol
- $\alpha $
-
angle-of-attack
- $\theta $
-
pitch angle
- ${\gamma _M}$
-
track angle
- $\rho $
-
the density of air
- $\lambda $
-
the LOS angle
1.0 Introduction
The integrated guidance and control (IGC) design method has gained significant attention since its initial proposal due to its ability to maximise the missile’s flight performance [Reference Guo, Wang, Hu and Guo1] and overall operational effectiveness. By leveraging the coupling relationship between the guidance and control systems, it combines the advantages of low design cost and robustness. Numerous research studies have been dedicated to exploring this method by scholars. Several approaches have been studied for the design of integrated guidance and control, including sliding mode control for designing sliding surfaces [Reference Wu, Lu and Wang2], optimal control methods [Reference Wang, Zhang, Lin and Li3], feedback linearisation [Reference Xu, Chen and Wang4], backstepping control method [Reference Hu, Wei and Wang5], adaptive control [Reference Jiang, qing Tian, yan Sun and ge Liang6], and active disturbance rejection control [Reference Zhao, Cao and Huang7]. During the flight of the missile, various failures, such as rudder surface failures, sensor failures, and other mechanical failures, are commonly encountered in the missile systems. Recent research works have focused on fault-tolerant control (FTC) challenges [Reference Wang, Yuan, Pan and Wei8–Reference Wang and Yuan10]. For instance, an adaptive barrier fast terminal sliding mode control method was proposed to mitigate actuator failures in unmanned aerial vehicles [Reference Najafi, Vu, Mobayen, Asad and Fekih11]. In multi-intelligent systems with node failures and switching topologies, a distributed adaptive fuzzy fault-tolerant control method has been suggested [Reference Zhao, Zhao and Che12]. While there are few FTC-related studies in IGC systems, one literature review has addressed elevator and rudder section failures within strict feedback IGC structures [Reference Wang and Yuan10]. Asghar et al. [Reference Ashrafifar and Jegarkandi13] considered a burned or broken tailplane failure and developed an IGC system for a ground-to-air missile. Zhao [Reference Zhao14] proposed a fault-tolerant control method for handling rudder surface effectiveness loss failures in the vehicle.
Overall, the fault-tolerant integrated guidance and control (FTIGC) design method has attracted extensive research efforts, and various approaches have been explored for the integrated guidance and control system design. Nevertheless, the fault-tolerant controls discussed above are based on the transformation of the vehicle model into an affine input form. The cases where each order of subsystem input to the system is fully non-affine is not considered. In reality, the system is more realistic represented by a non-affine projection. In addition, there is currently no fault-tolerant control design for the non-affine form of the IGC model, with only the literature [Reference Chen15] considering non-affine aerodynamic characteristics to build the IGC model of the STT missile. Undoubtedly, in practical engineering applications, many parameters of the missile flight and guidance system, such as torque, exhibit non-affine characteristics. Therefore, in this paper, we aim to address the aforementioned problem by developing a new NAIGC scheme for missiles that experience rapidly changing actuator failures and multiple uncertainties from different sources. We describe the missile dynamics and interceptor target kinematics as a non-affine non-linear IGC system, which is more consistent with practical engineering applications.
Reinforcement learning (RL) has gained significant attention as a learning control method due to its ability to deal with unknown uncertainties and has been extensively researched in recent years [Reference Liu, Li, Tong and Chen16–Reference Peng, Hu, Shi, Luo, Huang, Ghosh and Huang20]. In Ref. [Reference Yang, Modares, Wunsch and Yin21], RL was introduced to address the distributed leader-follower output synchronisation problem for linear heterogeneous systems with activities. Moreover, actor-critic structures are frequently employed in reinforcement learning for uncertain systems [Reference Fan, Yang and Ye22, Reference Hu, Li, Xue and Liu23]. The critic network receives information about the system from the task environment and gives a cost function to evaluate the control performance. Based on the cost function, the actor network is used to generate the next control policy for the actuator. Ouyang et al. [Reference Ouyang, Dong, Wei and Sun24] designed an actor-critic adaptive control method for tracking control of an uncertain elastic joint robot. The critical neural network was used to approximate the cost function, while the actor neural network was used to handle system uncertainty and generate control inputs for the actuator. Liu et al. [Reference Liu, Shan, Rong and Zheng25] proposed an incremental reinforcement learning control method with adaptive learning rate to improve the success rate of flight controllers. A distributed reinforcement learning guidance strategy under angle-of-attack constraints was investigated in Ref. [Reference Bohao, Xuman, Xiaofei, Yunjie and Guofei26]. Pei et al. [Reference Pei, Shao-ming, Jiang and De-fu27] used the deep deterministic policy gradient algorithm to integrate guidance control into a reinforcement learning framework to intercept targets using reinforcement learning-generated intelligences, and numerically verified the effectiveness and robustness of the method. In the literature [Reference Song, Luo, Zhao, Hu and Zhang28], the IGC system was modeled as a reinforcement learning process based on a three degrees of freedom motion model of the hypersonic vehicles in the longitudinal plane, and a proximal policy optimisation algorithm-based IGC system was designed. It can be seen that the performance of reinforcement learning actor-critic is satisfactory for controlling the vehicle or guidance system. The model of the IGC problem exhibits intricate nonlinear dynamics, encompassing non-linear relationships, non-affine terms and uncertain disturbances. Simultaneously, the IGC problem necessitates a high level of certainty in the control scheme, demanding precise system control in accordance with predetermined planning trajectories. The reinforcement learning actor-critic approach has yet to be investigated for its applicability when dealing with such complex models.
The sensor measurement bias, as well as the actuator effectiveness failure and bias failure generated during the flight, can disturb the attitude control system, which makes the controller design more challenging. In addition, external disturbances and structural uncertainties should be considered, which can also cause troubles during the controller design process. Inspired by the above, this paper will focus on the FTIGC problem for a class of non-affine forms with structural uncertainties, actuator failures and external disturbances in the NAIGC system. The main challenges of this paper are how to model the NAIGC system and how to deal with the non-affine problem and various unknown uncertainties and time-varying fault variations. By introducing an adaptive expansion integral system to deal with the non-simulation problem and fully combining the reinforcement learning actor-critic architecture and the approximation capability of radial basis function neural network (RBFNN), the unknown uncertainties and fault problems can be well handled by using bounded adaptive control techniques. Compared with existing results, the method proposed in this paper has the following contributions:
-
• To the best of the authors’ knowledge, the method applying actor-critic to the design of adaptive fault-tolerant NAIGC is proposed for the first time and can be an effective solution of solving multiple uncertainties.
-
• A new type of non-affine integrated guidance and control design model is established for a class of STT missile with actuator failures and multiple uncertainties and can be extended to other aircraft with non-affine structures.
-
• Benefiting from the combination of adaptive boundary, RBFNN, and actor-critic, the missile’s ability to respond to actuator failures and formulation and preliminaries of target manoeuvers problem can be greatly improved.
2.0 Problem formulation and preliminaries
2.1 Non-affine IGC model
Consider the following nonlinear longitudinal model of the missile where the gravity and the coupling between the longitudinal and lateral channel are neglected:
where c x , c z1, c z2, c m1, c m2, c m δ e denote the aerodynamic coefficients.
The kinematics of a planar missile intercept target can be described as [Reference Wang, Xiong, Wang, Song and Lai29]:
Considering the actuator fault, u(t) is defined as the actuator inputs. Then the output of the actuator fault is expressed as
where d δ represents the error of the actuator malfunction. λ δ represents the scale factor of the actuator gain fault. Assume that d δ is a bounded unknown variable, λ δ takes values in the interval [0,1]. According to the corollary of Ref. [Reference Wang and Yuan10], when $V_{\lambda }\rightarrow k_{0}\sqrt{r}$ , where k 0 > 0 is a constant, the direct hit can be obtained. Thus by defining $\chi =V_{\lambda }-k_{0}\sqrt{r}$ , we can obtain the time derivative of χ.
Meanwhile, according to the kinetic eqnarrays of n L and q, we can obtain
Define x 1 = χ, x 2 = n L , x 3 = q and
We can get the following non-affine system,
where x i $\in$ R i , i = 1, 2, 3 are system state variables. y = x 1 is the system output. Δf i and Δb i , i = 1, 2, 3 are the uncertainty caused by the measurement error, in fact, Δb i $\in$ [−0.5,+0.5]. Obviously the integrated design model of guidance control is a third-order non-affine system, and the relationship between the outer and inner loops is shown schematically in Fig. 1.
Remark 1. The non-affine IGC model established in this paper is more general, and the aerodynamic characteristics of the missile and the rate of change of deflection are considered in the form of non-affine functions, so that each subsystem contains non-affine inputs, which has great reference significance for practical engineering applications, but also makes the design of the controller more difficult.
The design goal of this paper is to design a class of RBFNN and actor-critic based adaptive controllers such that the control converges to zero in the presence of multiple factors such as actuator failure, simultaneous changes in unknown target acceleration, and coupled multi-source uncertainty, and that the relevant gain parameters in the controllers converge with bounds.
In this article, the following assumptions are necessary:
Lemma 1. [Reference Wang and Yuan30] The following inequality holds for any $\varepsilon$ > 0 and z $\in$ R, we have
where κ is a positive number satisfying κ = e −(κ+1), i.e., κ = $0.2785$
Lemma 2. [Reference Huang31] (Cauchy inequality) $\|\bullet\|$ is the Euclidean paradigm of the vector, i.e. $||x||=\sqrt{{\bf x}^{T}{\bf x}}$ . For $\forall {\bf x},{\bf y}\in R^{m}$ , the following inequality holds:
Lemma 3. [Reference Xia, Lian, Su, Shen and Chen32] (Young inequality) Given normal numbers p and q that satisfy 1/p + 1/q = 1, to any x, y $\in$ and any $\varepsilon$ > 0, the following inequality is true:
Lemma 4. [Reference He and Dong33] The Lyapunov function V(t) with initial bounded condition V(0), if the derivative of V(t) satisfies
where CV and EV are positive constants, V(t) is bounded.
Lemma 5. [Reference Wang, Yuan, Pan and Che34] For any constant $\varepsilon$ > 0 and any variable z $\in$ R, the following relationship holds
Assumption 1. There exist positive constants $\underline{g}$ and $\bar{g}$ , the following inequality holds
where $\bar{x}_{i+1}=[x_{1},\ldots, x_{i+1}],i=1,\ldots, n.x_{n+1}=u$
Remark 2. This assumption is introduced to make the whole system controllable.
2.2 Neural networks for approximation
In this paper, we require to approximate the system uncertainty and unknown cost functions. System uncertainty is estimated using RBFNN, which is a three-layer network. The first layer is called the input layer, the second layer is called the hidden layer. RBFNN generally contains only one hidden layer, and the weights of all neurons from the input to the hidden layer are 1. The third layer is called the output layer. In this paper, we use the following Gaussian function, denoted as $\phi$ (x), as the radial basis function: μ j is the function centroid of the j-th node of the hidden layer, and σ j is the width of the j-th node:
The final output is defined as
where x $\in$ R n and y $\in$ R are the input and output of RBFNN, respectively. $\hat{W}$ = [w 1,…,w n ] T $\in$ m denotes the output layer weight matrix, m represents the number of hidden nodes in the hidden layer. Φ(x) = [Φ 1(x),…,Φ m (x)] T , where:
It has been shown that for the smooth function, there exists an optimal weight [Reference Yu, Long, Chen and Wang35], such that
where $\varepsilon$ (x) is the approximation error, which can be made arbitrarily small by increasing the number of nodes in the hidden layer.
Assumption 2. For the basic functions used for fitting in the actor-critic neural network mentioned later Ψ a (Z a ) and Ψ J (Z c ), they satisfy $\beta$ a ≤ ||Ψ a || ≤ $\psi$ a , $\beta$ J ≤ ||Ψ J || ≤ $\psi$ J and for the derivatives of Ψ a (Z a ) and Ψ J (Z c ), they satisfy λ a ≤ || $\dot{\boldsymbol{\Psi}}$ a || ≤ γ a , λ J ≤ || $\dot\Psi$ J || ≤ γ J . In addition, the derivatives of the estimation error and the estimation error are bounded when using neural network fitting approximation simultaneously, | $\varepsilon$ i (Z)| ≤ ς i ,| $\dot{\varepsilon}$ i (Z)| ≤ ξ i , where ς i ,ξ i are positive numbers.
3.0 Main results
To solve the problem of non-affine inputs in the system (7), an auxiliary integral system is introduced, and as an auxiliary control input, the augmented system is expressed as
Remark 3. By adding an auxiliary integration system, the original third-order non-affine input system is transformed into a fourth-order affine input system, which effectively overcomes the non-affine input problem in the system (7).
The backpropagation design process, influenced by the non-affine input, consists of four steps, and the actual control is given in Step 4, and the control box is shown in Fig. 2.
3.1 Design steps of the reinforcement learning adaptive fault-tolerant IGC method
Define the error variables as follows
where g id , i = 1, 2, 3 is the filtered signal of the virtual control law for the i-th subsystem. We introduce a new variable g ic obtained from
Remark 4. By introducing dynamic surfaces, the derivatives of the controllers in the simulation programme can be transformed, which in turn reduces the complexity of the operations.
The boundary layer error is defined as
where g ic is the virtual control law designed for the i-th subsystem. 0 < τ i < 1, i = 1, 2, 3 is the filtering time constant to be designed. Then combining Equations (20) and (21) we can obtain that
Furthermore, for the unknown nonlinear function, we define
Let $\hat{\theta}$ be the estimations of θ. The corresponding estimation errors are defined as $\tilde{\theta}$ = $\hat{\theta}$ − θ.
3.1.1 Step 1
Define D 1 = sup t ≥ 0||d 1(t)||. Denote $\hat{D}$ 1 as the estimate of D 1. Moreover, assuming that the disturbance error $\bar{\varepsilon }_{{D_{1}}}$ estimated by the tanh function bounder is bounded:
where $\varepsilon$ D 1 > 0 is a parameter to be designed. Combining Equations (18) and (21) and the first formula of Equation (19), we can get that
Hence, it follows that
RBFNN is introduced to approximate the nonlinearity Δf 1 in Equation (18). Obviously Δf 1 = W 1 T Φ 1(x 1) + $\varepsilon$ Δ 1 (1), $\varepsilon$ Δ 1 (x 1) is the RBFNN estimation error with an upper bound $\varepsilon$ m1.
Based on Lemma 2, Lemma 3 and Lemma 5, we can obtain that
where $\varepsilon$ 11 > 0 is a parameter to be designed.
Remark 5. In this paper, in order to reduce the computational burden, so the upper bound of the neural network weight parametrisation is used for adaptive compensation, or it can be solved directly with a multi-dimensional vector without the upper bound, and the final controller as well as the form of the adaptive update law are similar to this method.
The virtual controller can be designed as follows
where k 1 > 0 is a parameter to be designed.
Combining Equations (26)–(28), we can rewrite Equation (26) as
Define the adaptive update law of $\hat{D}$ 1 is
where η D 1 , σ D 1 > 0 are parameters to be designed.
3.1.2 Step 2
We define the cost function as follows
i.e, c = $\dot{\!J}$ .
Critic network: Due to the non-deterministic nature of the cost function, the neural network can be used to estimate as
where Θ J $\in$ R l c is the ideal critic network weight. l c represents the number of hidden nodes. $\varepsilon$ J represents the estimation error. Z c = [z 2] is the input to the critic neural network. Define the estimate of the cost function as
where $\hat{\Theta}$ J is the actual critic network weight and we have $\tilde{\Theta}$ J = $\hat{\Theta}$ J − Θ J with $\tilde{\Theta}$ J being the critic neural network weight error. Then we define a critic error as
The function of the critic error can be designed as
In the framework of σ- modification, using the gradient descent method, we can obtain the update law of $\hat{\Theta}$ J as
Actor network: with Equation (18), Equation (21) and the second formula of Equation (19), we can get that
Thus
By using Lemmas 2, 3 and 5, we have
where $\varepsilon$ 12 > 0 is a parameter to be designed. Based on the approximation of the neural network, the uncertainty Δf 2 in Equation (18) is approximated as
where Θ a $\in$ R l a is the ideal critic network weight with l a denoting the number of hidden nodes, Z a = [x 1,x 2] T is the input to the actor neural network, and $\varepsilon$ a (Z a ) denotes the function reconstruction error. Additionally, $\hat{\Theta}$ a is the actual actor network weight, and $\tilde{\Theta}$ a = $\hat{\Theta}$ a − Θ a , $\tilde{\Theta}$ a is the actor neural network weight error. Then we design a actor error as
We define the actor error function as
According to the gradient descent method, we can get the update law of $\hat{\Theta}$ a as
However, since $\tilde{\Theta}$ a is unknown, so that we let $\hat{\Theta}$ a replace $\tilde{\Theta}$ a . Substituting Equation (41) into Equation (43) yields
By introducing σ correction, Equation (44) can be rewritten as
We design the following virtual controller
where k 2 > 0 is a parameter to be designed. With the aid of Equations (38)–(46) we know that
3.1.3 Step 3
Define D 3 = sup t ≥ 0||d 3(t)||, $\hat{D}$ 3 is the estimate of D 3. Furthermore, assuming that the disturbance error $\bar{\varepsilon }_{{D_{3}}}$ estimated by the tanh function bounder is bounded
where $\varepsilon$ D 3 > 0 is a parameter to be designed. In view of Equations (18), (21) and the third formula of Equation (19), we can get that
Thus
RBFNN is introduced to approximate the nonlinearity Δf 3 in Equation (18). Ideally Δf 3 = W 3 T Φ 3(x 1,x 2,x 3) + $\varepsilon$ Δ 3 (x 1,x 2,x 3), and $\varepsilon$ Δ 3 (x 1,x 2,x 3) is the RBFNN estimation error with an upper bound $\varepsilon$ m3, then combining Lemmas 2, 3 and 5 yields
where $\varepsilon$ 13 > 0, $\varepsilon$ 33 > 0 are parameters to be designed. Design the following control signals
where k 3 > 0 is a parameter to be designed.
Combining Equations (50)–(52), it can be known that
Define the adaptive update law of $\hat{D}$ 1 as
where η D 3 , σ D 3 > 0 are parameters to be designed.
3.1.4 Step 4
In view of eqnarray Equations (18), (21) and the forth formula of Equation (19), we have
Thus
With the aid of Lemmas 2, 3 and 5, it is obvious that
where $\varepsilon$ 14 > 0, $\varepsilon$ 34 > 0 are parameters to be designed. Select the final actual control law u f as
where k 4 > 0 is a positive parameter to be designed.
In view of Equations (56)–(58), it can be known that
Last but not least, we select the following adaptive update laws of $\hat{\theta}$ 1 and $\hat{\theta}$ 3
where η 1, η 3 > 0 are the gains of the adaptive update laws and σ 1, σ 3 > 0 are parameters to be designed.
3.2 Analysis of stability
Theorem 1. Considering the NAIGC system (7) in the presence of actuator faults and unknown uncertainties, the controller (58), the parameter update laws (30), (54), the gradient descent update laws (36), (45) and the adaptive update law (60). Suppose that Assumptions 1, 2 are satisfied, and the error of the hyperbolic tangent function estimation disturbance is bounded. Then the following conclusions hold
-
• The output guidance strategy of the system eventually converges to near zero, i.e., precision guided interception can be achieved.
-
• The boundedness of all signals can be guaranteed and the tracking error converges to zero.
We construct the Lyapunov function
Combining Equations (22), (29), (53), (59) we can take the derivative of V(t) as
According to Lemma 1, we can obtain that
Consequently, the sixth and eighth items of Equation (62) satisfy the following inequality
Combining Assumption 1 and Equations (24), (48), the seventh and ninth items of Equation (62) satisfy
With Equations (64), (65) and the update laws (30), (43), (36), (54), Equation (62) can be rewritten as
In view of inequality $2ab\leqslant a^{2} + b^{2}$ and Assumption 1 we have
According to Lemma 3, we can get that
Substituting Equations (67), (68) into Equation (66) yields that
Based on Assumption 2, we can know that ||Ψ a || ≤ $\psi$ a , ||Ψ J || ≤ $\psi$ J . The sixth term of Equation (69) can be calculated that
Based on Assumption 2, we can know that || $\dot \Psi$ J || ≤ γ J , | $\dot{\varepsilon}$ J | ≤ ξ J . The seventh term of Equation (69) holds
Based on Assumption 2, we can know that | $\varepsilon$ a | ≤ ς a . The eighth and ninth items of Equation (69) can be formulated that
Considering the following inequality
Combining Equations (70)–(73), we can finally draw the following conclusion
where
According to Lemma 4, V(t) is bounded. Hence, the parameters in V(t) are bounded. Furthermore, the control signals are convergent and bounded so that we can draw the conclusion that the NAIGC system is stable. The proof is completed.
4.0 Simulation
In this paper, the validity and effectiveness of the proposed method are verified by numerical simulations. The effectiveness of the proposed method is verified by designing simulations considering the time-varying maneuver acceleration of the target and the time-varying actuator failure of the missile flight control. The robustness of the proposed method is also verified by comparing it with actor-critic without reinforcement learning.
The initial conditions of the missile kinematic eqnarrays and the initial velocity of the target are given in Ref. [Reference Wang and Yuan10]. The aerodynamic and body parameters of the missile are given in Ref. [Reference Wang and Yuan10] and the angle of the elevator is limited to [−30∘,30∘]. The initial position of the missile is set as follows: x(0) = 0m, y(0) = 0m. The flight path angle of the target is initialised γ T (0) = 0. The initial conditions for the actuator fault output are as follows: λ δ (0) = 1, u(0) = 0, d δ (0) = 0. The initial values for the adaptive parameters and neural network related parameters in the control step are as follows:
Different initial positions of the targets under different conditions, different actuator failures, different time-varying target maneuver acceleration A Tr and A Tλ . Specific parameters are shown in Tables 1, 2 and 3, respectively. The control gain is set to k 1 = 5, k 2 = 20, k 3 = 40, k 4 = 150; The parameters of the bounders of the tanh function are chosen as $\varepsilon$ D 1 = 500, $\varepsilon$ D 3 = 100; The parameters of the actor-critic network weight gradient descent update law are chosen as η a = 0.1, σ a = 50, η J = 0.1, σ J = 50. The parameters of the adaptive update law are chosen as η D 1 = 0.01, σ D 1 = 20, η D 3 = 0.001, σ D 3 = 1e4, η 1 = 0.001, σ 1 = 50, η 3 = 5e − 7, σ 3 = 10; The parameters of the filter are chosen as τ 1 = 0.2, τ 2 = τ 3 = 0.1, The other parameters are c 0 = 0.1, $\varepsilon$ 11 = $\varepsilon$ 12 = $\varepsilon$ 13 = $\varepsilon$ 33 = $\varepsilon$ 14 = $\varepsilon$ 34 = 1.
The simulation results for Case 1, Case 2 and Case 3 are shown in Figs 3, 4 and 5, respectively. The horizontal coordinates of the missile and the target in the two-dimensional plane are shown in Figs 3(a), 4(a) and 5(a). The vertical coordinates of the missile and the target in the two-dimensional plane are shown in Figs 3(b), 4(b) and 5(b). Figures 3(c), 4(c) and 5(c) show the system output y (guide variable χ). Figures 3(d), 4(d) and 5(d) display the normal acceleration n L of the missile. Moreover, Figs 3(e), 4(e) and 5(e) show the pitch rate q of missile. The actuator outputs with validity faults and deviation faults are depicted in Figs 3(f), 4(f) and 5(f). The estimate of the upper bound of the interference norm D 1, D 3 are given in Figs 3(g), (h), 4(g), (h), 5(g) and (h). The paradigms of the estimate of the parameter θ 1 and θ 3 are shown in Figs 3(i), (j), 4(i), (j), 5(i) and (j). The weight paradigms for actor-critic neural network are depicted in Figs 3(k), (l), 4(k), (l), 5(k) and (l). In conclusion, the results show that the interception strategy χ converges to zero and maintains accurate hit-kill interception even when the target possesses time-varying acceleration and the missile has an actuator fault. As a result, the effectiveness of the proposed method can be verified.
At the same time, we compare the proposed control method of this paper with the backstepping fault-tolerant non-affine integrated guidance and control (BS-FTNAIGC) method, the adaptive boundary estimation fault-tolerant non-affine integrated guidance and control (ABE-FTNAIGC) method, and the radial basis function fault-tolerant non-affine integrated guidance and control (RBF-FTNAIGC) method. The initial conditions are as follows : the relative distance between projectile and target along line of sight (LOS) r(0) = 10, 816.654m, the angle between LOS and the horizontal reference line λ = 0.588rad, the target location x T = 9, 000, y T = 6, 000, the target acceleration A Tr = 5sin (45*t), A Tλ = 10 + 4sin (7.8/0.022*t). The actuator faults happen at t > 2s with λ δ = 0.8, d δ = 0.05 and at t > 4s with λ δ = 0.6, d δ = 0.12. Other initial conditions are the same as above. The specific comparison is as follows: Fig. 6 displays the system output in the four methods of the NAIGC system. The actuator fault outputs in the four methods of the NAIGC system are shown as Fig. 7. According to the simulation results, it can be seen that the output of the controller and the control target of this paper are finally stable and convergent. At the same time, the overshoot and oscillation of the control method in this paper are smaller. Compared with other methods, the control method proposed in this paper can make the variables more quickly stable, and the steady-state error after stability is smaller. It can be seen that the actor-critic RBFNN has great advantages in the application of non-affine IGC systems.
5.0 Conclusion
In this paper, the NAIGC system is established for a class of STT missile with actuator failures, target acceleration variations and coupling multi-source uncertainties. The non-affine problem of the established model can be solved by the newly introduced integral expansion system. By introducing hyperbolic tangent function, RBFNN and reinforcement learning actor-critic neural network architecture, different adaptive laws and gradient descent update laws are constructed, which can reduce the deviation of actuator fault and target acceleration change, and effectively compensate the influence of multi-source uncertainty. Therefore, this paper not only designs a non-affine IGC model which is more suitable for practical application, but also pro-poses a new control method to apply reinforcement learning actor-critic to NAIGC, which can achieve accurate hit guidance. Finally, the effectiveness and superiority of the proposed method are verified by numerical simulation. In the future, we will study 3D IGC and add constraints.
Acknowledgements
This work was supported in part by the Foundation of China National Key Laboratory of Science and Technology on Test Physics & Numerical Mathematics (Grant Number: JP2022-800006000107-237), the Foundation of China National Key Laboratory of Science and Technology on Test Physics & Numerical Mathematics (Grant Number: 08-YY-2023-R11) and the National Natural Science Foundation of China (Grant Number: 62303378). Meanwhile, it was also supported by the Foundation of Shanghai Astronautics Science and Technology Innovation (Grant Number: SAST2022-114).
Competing interests
The authors declare none.