Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-26T20:57:18.028Z Has data issue: false hasContentIssue false

Asymmetric constrained control scheme design with discrete output feedback in unknown robot–environment interaction system

Published online by Cambridge University Press:  09 September 2022

Xinyi Yu
Affiliation:
Zhejiang University of Technology College of Information Engineering, Hangzhou 310023, China
Huizhen Luo
Affiliation:
Zhejiang University of Technology College of Information Engineering, Hangzhou 310023, China
Shuanwu Shi
Affiliation:
Zhejiang University of Technology College of Information Engineering, Hangzhou 310023, China
Yan Wei
Affiliation:
Zhejiang University of Technology College of Information Engineering, Hangzhou 310023, China
Linlin Ou*
Affiliation:
Zhejiang University of Technology College of Information Engineering, Hangzhou 310023, China
*
*Corresponding author: E-mail: linlinou@zjut.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

In this paper, an overall structure with the asymmetric constrained controller is constructed for human–robot interaction in uncertain environments. The control structure consists of two decoupling loops. In the outer loop, a discrete output feedback adaptive dynamics programing (OPFB ADP) algorithm is proposed to deal with the problems of unknown environment dynamic and unobservable environment position. Besides, a discount factor is added to the discrete OPFB ADP algorithm to improve the convergence speed. In the inner loop, a constrained controller is developed on the basis of asymmetric barrier Lyapunov function, and a neural network method is applied to approximate the dynamic characteristics of the uncertain system model. By utilizing this controller, the robot can track the prescribed trajectory precisely within a security boundary. Simulation and experimental results demonstrate the effectiveness of the proposed controller.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1. Introduction

The applications of human–robot interaction (HRI) range from material assembly [Reference Rovedal, Pallucca, Pedrocchi, Brafhin and Tosatti1], rehabilitation [Reference Zhang, Mcdaid, Veale, Peng and Xie2], and elderly care [Reference Murata, Yamazoe, Chung and Lee3]. With the variability of the environment, the robot needs to adjust the uncertainty automatically. Improvement of the flexibility of the robot’s interaction with the environment becomes a challenging problem.

In the fields of HRI control, impedance control [Reference Sharifi, Behzadipour and Vossoighi4] and admittance control [Reference Dohring and Newman5] are two common methods by adjusting robot impedance parameters. In the 1980s, the concept of impedance control was proposed by Hogan [Reference Hogan6]. In this method, a damping-spring-mass model was used for representing the dynamic relationship between the environment and the robot position. The input force of the robot is obtained by measuring the position of the robot with appropriate impedance parameters. Admittance control realizes the compliant control by controlling the interaction force and generating the desired trajectory. In the early research, model parameters of the two methods are usually prescribed [Reference Colbaugh and N. Hogan7Reference Buerger and Hogan9]. In ref. [Reference Newman8], a guaranteed-stable interaction controller was derived for arbitrary linear, time-invariant systems. In ref. [Reference Buerger and Hogan9], a loop-shaping method was developed to design actuator controllers for physically interactive machines. However, inappropriate model parameters will cause outrageous errors. For human–environment interaction, both the varying environmental factors and the robot model parameters need to be considered, which makes it difficult to obtain appropriate parameters.

To solve the above problems, it is necessary to introduce variable impedance method into the field of robot control. The development of reinforcement learning provides the conditions for adaptive dynamic programing (ADP) [Reference Vrabie and Lewis10Reference Roveda, Maskani, Franceschi, Arash and Pedrocchi12], which is broadly applied in optimal control problems with uncertain dynamics. An adaptive optimization algorithm was proposed for continuous-time based on part of dynamic model [Reference Vrabie and Lewis10]. An improved algorithm [Reference Jiang and Jiang11] was developed for the system with complete unknown dynamic model. A model-based reinforcement learning variable impedance controller is proposed, in which model predictive control (MPC) was applied to optimize the impedance control parameters online [Reference Roveda, Maskani, Franceschi, Arash and Pedrocchi12]. However, the state constraints and the system stability analysis are not considered. For those systems under conditions of measurable state, the output feedback algorithm is an effective method to estimate the state information [Reference Rizvi and Lin13Reference Liu, Ge, Zhao and Mei15]. Rizvi established a state observer to estimate the state of the system [Reference Rizvi and Lin13]. Gao et al. reconstructed the state based on the discrete state space model [Reference Gao, Jiang, Jiang and Chai14]. The output feedback (OPFB) ADP algorithm was selected to obtain optimal solution which considers the arbitrary environment position/trajectory [Reference Liu, Ge, Zhao and Mei15]. The OPFB ADP algorithm is an effective method to determine the optimal impedance parameters of the human–robot system in unknown environment dynamics.

In HRI tasks, humans make direct collaborations with robots. In this situation, safety is vital in HRI systems. However, the desired trajectory guided by interaction force may exceed regular working space in admittance control. In the past decades, a lot of research works has been done in the safety control field [Reference Al-Tamimi, Lewis and Abu-Khalaf16Reference Vicentini, Pedrocchi, Beschi, Giussani and Fogliazza20]. An integrated HRI strategy was proposed to ensure safety through a coordinated suite of safety strategies [Reference Al-Tamimi, Lewis and Abu-Khalaf16]. A cooperative fuzzy-impedance control with embedded velocity and force rules was proposed to enhance safety [Reference Roveda, Haghshenas, Caimmi, Pedrocchi and Tosatti17]. In order to improve the safety performance, an actuation mechanism was built up in ref. [Reference Ahmed and Kalaykov18]. A task-space admittance controller was proposed in which the inertia matrix was conditional online [Reference Fonseca, Adorno and Fraisse19]. Human–robot collaboration and control were addressed in ref. [Reference Vicentini, Pedrocchi, Beschi, Giussani and Fogliazza20], including safety assessment, advanced force control, human-aware motion planning, gesture recognition, and task scheduling. The conception of barrier Lyapunov function (BLF) was put forwarded to guarantee states within certain bounds [Reference Ngo, Mahony and Jiang21]. The BLF method was widely applied in the control field [Reference Ames, Coogan, Egerstedt, Notomista, Sreenath and Tabuada22Reference He, Chen and Yin24], which was employed to prevent the violation of the output constraints in ref. [Reference Zhang, Xiao and Ge23], and was used in the design of an adaptive impedance control [Reference He, Chen and Yin24]. In terms of the ability to fit arbitrary nonlinear curves [Reference He, Chen and Yin24Reference He and Dong26], neural network (NN) was considered to be an effective method in control systems, which require less relative information about the system dynamics [Reference Yu, Xie, Paszczynski and Wilamowski25]. BLF-based controllers with NN approaches were studied by many researchers to ensure the safety and stability of robot [Reference He and Dong26Reference Liu, Lu, C. Tong, Chen, Chen and Li28].

In order to ensure the high accuracy of the impedance model, while guaranteeing the performance of the trajectory tracking, a double loops control system was proposed [Reference Li, Liu, Huang, Peng, Pu and Ding29, Reference Modares, Ranatunga, Lewis and Popa30]. The control method decouples the robot-specific inner loop controller from the task-specific outer loop control, which considers the safety of tracking and the factor of the environment model. Inspired by this double loop system, a new two loops control structure through the feedback of force is proposed in this paper. Different from ref. [Reference Modares, Ranatunga, Lewis and Popa30], the environment and robot impedance model are integrated together with linear quadratic regulation (LQR). Considering the partial unmeasurable states, an output feedback algorithm is presented to estimate the optimal control policy of LQR problem in the outer loop. By constructing the asymmetric barrier Lyapunov function (ABLF) controller in the inner loop, the stability of the system is ensured under the condition of flexible constrains in HRI system. The design of NN in adaptive controller compensates for the dynamic uncertainties. Simulations and experimental results validate the effectiveness of inner and outer loops. The primary contributions of this paper are summarized as follows:

  1. 1) Two-loop control structure is designed to assist the human operators to perform cooperative task with robot and optimize the tracking performance. The inner and outer loops are decoupled.

  2. 2) Considering the unknown dynamic and position parameters when interacting with the environment, optimal impedance adaptation method is used to obtain the optimal control law in the outer loop. The discrete output feedback algorithm realizes the optimal control of the complete robot–environment interaction system, and a discount factor is added to the algorithm to improve the convergence speed.

  3. 3) Robot is controlled in the inner loop to track the desired trajectory from the output of admittance control. ABLF is proposed in the design of inner loop controller to restrict the motion space and tracking errors. Compared to the symmetrical controllers [Reference Liu, Lu, C. Tong, Chen, Chen and Li28], this method is able to adjust the upper and lower bounds to adapt to different task requirements. Meanwhile, radial basis function neural network (RBFNN) is designed to compensate for the unknown dynamics of the robot, guaranteeing the precision of admittance model.

The rest of this paper is organized as following arrangements. In Section 2, a system of HRI is described. In Section 3, an overall system structure is proposed with outer loop and inner loop, which are described, respectively. In Section 4, simulations of inner and outer loops are given to verify the proposed methods, respectively. In Section 5, experimental results of a practical task conducted on a Franka robot are shown. Conclusion is presented in Section 6.

2. System description

The HRI system includes a rigid robot manipulator and human, where the end-effector of the robot manipulator interacts with human physically. The dynamic of robot manipulator is described by

(1) \begin{equation} {\boldsymbol{{M}}}({\boldsymbol{{q}}})\ddot{{\boldsymbol{{q}}}}+{\boldsymbol{{C}}}({\boldsymbol{{q}}},\dot{{\boldsymbol{{q}}}})+{\boldsymbol{{G}}}({\boldsymbol{{q}}})={\boldsymbol{\tau }}+{\boldsymbol{{J}}}^{\mathrm{T}}({\boldsymbol{{q}}}){\boldsymbol{{F}}}, \end{equation}

where

(2) \begin{equation} {\boldsymbol{{q}}}=\varphi ({\boldsymbol{{x}}}), \end{equation}
(3) \begin{equation} \dot{{\boldsymbol{{q}}}}={\boldsymbol{{J}}}^{-1}({\boldsymbol{{q}}})\dot{{\boldsymbol{{x}}}}, \end{equation}
(4) \begin{equation} \ddot{{\boldsymbol{{q}}}}=\dot{{\boldsymbol{{J}}}}^{-1}({\boldsymbol{{q}}})\dot{{\boldsymbol{{x}}}}+{\boldsymbol{{J}}}^{-1}({\boldsymbol{{q}}})\ddot{{\boldsymbol{{x}}}}. \end{equation}

Here $\boldsymbol{{q}}$ , $\dot{{\boldsymbol{{q}}}}$ , $\ddot{{\boldsymbol{{q}}}}\in \mathbb{R}^n$ are joint angles, velocities, and accelerations, respectively. $\boldsymbol{{x}}$ , $\dot{{\boldsymbol{{x}}}}$ , $\ddot{{\boldsymbol{{x}}}}\in \mathbb{R}^m$ are the Cartesian positions, velocities, and accelerations of robot end effector. ${\boldsymbol{{M}}}({\boldsymbol{{q}}})\in \mathbb{R}^{n \times n}$ is the inertia matrix, ${\boldsymbol{{C}}}({\boldsymbol{{q}}}, \dot{{\boldsymbol{{q}}}})\in \mathbb{R}^{n \times n}$ is the Coriolis and centripetal coupling matrix, ${\boldsymbol{{G}}}({\boldsymbol{{q}}})\in \mathbb{R}^n$ is the gravity loading, ${\boldsymbol{{J}}}({\boldsymbol{{q}}})\in \mathbb{R}^{n \times n}$ is the Jacobian matrix. $\boldsymbol{\tau }$ is the vector of control input, ${\boldsymbol{{F}}}\in \mathbb{R}^n$ denotes the interaction force with the environment. $\varphi ({\cdot})$ represents the inverse kinematics. $n$ denotes the number of joints, and $m$ denotes the degree of freedom (DOF).

  1. 1. The matrix ${\boldsymbol{{M}}}({\boldsymbol{{q}}})$ is symmetric and positive definite.

  2. 2. The matrix $2{\boldsymbol{{C}}}({\boldsymbol{{q}}},\dot{{\boldsymbol{{q}}}})-\dot{{\boldsymbol{{M}}}}({\boldsymbol{{q}}})$ is skew-symmetric.

From the perspective of the robot manipulator, the following impedance model is presented for the safe of HRI.

(5) \begin{equation} {\boldsymbol{{M}}_{\boldsymbol{{d}}}}\ddot{{\boldsymbol{{x}}}}+{\boldsymbol{{C}}_{\boldsymbol{{d}}}}\dot{{\boldsymbol{{x}}}}+{\boldsymbol{{G}}_{\boldsymbol{{d}}}}({\boldsymbol{{x}}}-{\boldsymbol{{x}}_{\boldsymbol{{d}}}})={\boldsymbol{{F}}}, \end{equation}

where ${\boldsymbol{{M}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ , ${\boldsymbol{{C}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ , ${\boldsymbol{{G}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ are the desired inertia, damping, and stiffness matrices, respectively, and $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ denotes the desired trajectory of robot manipulator.

Remark 1: $\boldsymbol{{M}}_{\boldsymbol{{d}}}$ , $\boldsymbol{{C}}_{\boldsymbol{{d}}}$ , and $\boldsymbol{{G}}_{\boldsymbol{{d}}}$ are defined under different interaction behavior requirements.

To discuss the interaction behavior, it is necessary to take the environment dynamics into consideration. Two forms of environment models are presented.

(6) \begin{equation} {\boldsymbol{{M}}_{\boldsymbol{{e}}}}\ddot{{\boldsymbol{{x}}}}+{\boldsymbol{{C}}_{\boldsymbol{{e}}}}\dot{{\boldsymbol{{x}}}}+\boldsymbol{{G}}_{\boldsymbol{{e}}}\boldsymbol{{x}}=-{\boldsymbol{{F}}}, \end{equation}
(7) \begin{equation} \boldsymbol{{C}}_{\boldsymbol{{e}}} {\dot{\boldsymbol{{x}}}}+\boldsymbol{{G}}_{\boldsymbol{{e}}} {{\boldsymbol{{x}}}} =-{\boldsymbol{{F}}}, \end{equation}

where $\boldsymbol{{M}}_{\boldsymbol{{e}}}$ , $\boldsymbol{{C}}_{\boldsymbol{{e}}}$ , $\boldsymbol{{G}}_{\boldsymbol{{e}}}$ are unknown mass, damping, and stiffness of environment models, respectively. Eq. (6) represents a mass-damping-stiffness system and (7) represents a damping-stiffness system. The difference between (6) and (7) depends on the weight of the interactive environment model. For the sake of generality, we take the mass into consideration in robot–environment model. Thus, from (5) and (6), we have

(8) \begin{equation} ({\boldsymbol{{M}}_{\boldsymbol{{e}}}}+{\boldsymbol{{M}}_{\boldsymbol{{d}}}})\ddot{{\boldsymbol{{x}}}}+({\boldsymbol{{C}}_{\boldsymbol{{e}}}}+{\boldsymbol{{C}}_{\boldsymbol{{d}}}})\dot{{\boldsymbol{{x}}}}+{\boldsymbol{{G}}_{\boldsymbol{{e}}}}\boldsymbol{{x}} +{\boldsymbol{{G}}_{\boldsymbol{{d}}}}({\boldsymbol{{x}}}-{\boldsymbol{{x}}_{\boldsymbol{{d}}}})={\textbf{0}}. \end{equation}

3. Adaptive optimal control with discrete output feedback

3.1. Overall structure

The system diagram of overall structure in Fig. 1 is first designed and includes outer and inner loops. The outer loop of the system is to generate the virtual desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ according to the interaction force $\boldsymbol{{f}}$ and impedance model ${\boldsymbol{{Z}}}({\cdot})$ . The inner loop with an ABLF controller is designed to track the virtual desired trajectory precisely. The main purpose of the controller is to make the robot comply with the human’s objective, while guaranteeing safety and tracking accuracy.

Figure 1. System structure of double-loop control.

In general, the desired impedance model in the Cartesian space is written as

(9) \begin{equation} {\boldsymbol{{f}}}={\boldsymbol{{Z}}}({\boldsymbol{{x}}_{\boldsymbol{{d}}}},{\boldsymbol{{x}}_{\boldsymbol{{r}}}}), \end{equation}

where $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ is the desired trajectory and $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ is the reference trajectory in Cartesian space. Then, the virtual desired reference trajectory $\boldsymbol{{q}}_{\boldsymbol{{r}}}$ can be obtained by using inverse kinematics. The outer loop of the system is to generate the virtual desired trajectory according to the interaction force $\boldsymbol{{f}}$ and the impedance model ${\boldsymbol{{Z}}}({\cdot})$ . The OPFB ADP method is proposed in the outer loop to determine the optimized impedance parameters of the unknown robot–environment dynamics model.

The main purpose for the inner loop is to guarantee the tracking performance of trajectory $\boldsymbol{{q}}_{\boldsymbol{{r}}}$ . For the security of robot arm, ABLF-based controller is designed to constrain the state and speed of robot. Then, the RBFNN is employed to fit the unknown robot dynamics in the constrained controller.

3.2. OPFB ADP method in outer loop

To optimize the robot–environment interaction, an impedance adaptation method considering the environment dynamics is developed. In this paper, the robot and environmental system are taken into consideration to achieve the desired interaction performance by minimizing the following cost function:

(10) \begin{equation} {\boldsymbol{\Gamma }} =\int _{t}^{\infty }{\left [{{{\dot{{\boldsymbol{{x}}}}}}^{T}}{{{\boldsymbol{{Q}}_{\textbf{1}}}}}\dot{{\boldsymbol{{x}}}}+{{({\boldsymbol{{x}}}-{{\boldsymbol{{x}}_{\boldsymbol{{d}}}}})}^{T}}{\boldsymbol{{Q}}_{\textbf{2}}}({\boldsymbol{{x}}}-{{\boldsymbol{{x}}_{\boldsymbol{{d}}}}})+{{{\boldsymbol{{f}}}}^{T}}{\boldsymbol{{Rf}}} \right ]}d{\boldsymbol{\tau }}, \end{equation}

where $\boldsymbol{{Q}}_{\textbf{1}}$ , $\boldsymbol{{Q}}_{\textbf{2}}$ are positive definite, describing the weight of velocity and tracking errors, respectively. $\boldsymbol{{R}}$ is the weight of the interaction force torque. The state variable is defined as

(11) \begin{equation} {\boldsymbol{\xi }} ={{\left [{{{\dot{{\boldsymbol{{x}}}}}}^{T}},{{{\boldsymbol{{x}}}}^{T}},{\boldsymbol{{s}}_{\textbf{1}}}^{T},{\boldsymbol{{s}}_{\textbf{2}}}^{T} \right ]}^{T}}, \end{equation}

where $\boldsymbol{{s}}_{\textbf{1}}$ and $\boldsymbol{{s}}_{\textbf{2}}$ are two states constructed in the model, which can be written as

(12) \begin{equation} \begin{split} \left \{ \begin{array}{lr} \dot{{\boldsymbol{{s}}_1}}={\boldsymbol{{F}}_{\textbf{1}}\boldsymbol{{s}}_{\textbf{1}}}\\ \\[-8pt]{\boldsymbol{{x}}_{\boldsymbol{{d}}}}={\boldsymbol{{G}}_{\textbf{1}}\boldsymbol{{s}}_{\textbf{1}}} \end{array} \right .,\text{ } \left \{ \begin{array}{lr} \dot{{\boldsymbol{{s}}_2}}={\boldsymbol{{F}}_{\textbf{2}}\boldsymbol{{s}}_{\textbf{2}}}\\ \\[-8pt]{\boldsymbol{{x}}_{\boldsymbol{{e}}}}={\boldsymbol{{G}}_{\textbf{2}}\boldsymbol{{s}}_{\textbf{2}}} \end{array} \right. \end{split}, \end{equation}

where $\boldsymbol{{F}}_{\textbf{1}}$ and $\boldsymbol{{G}}_{\textbf{1}}$ are two known matrices, $\boldsymbol{{F}}_{\textbf{2}}$ and $\boldsymbol{{G}}_{\textbf{2}}$ are unknown matrices. Two linear systems in (12) are utilized to determine the desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ and the varying position of the contact trajectory $\boldsymbol{{x}}_{\boldsymbol{{e}}}$ .

According to (12), the cost function (10) can be rewriten as

(13) \begin{equation} \begin{aligned}{\boldsymbol{\Gamma }} &=\int _{t}^{\infty }{\left [{{{\dot{{\boldsymbol{{x}}}}}}^{T}}{\boldsymbol{{Q}}_{\textbf{1}}}\dot{{\boldsymbol{{x}}}}+\left [ \begin{matrix}{{{\boldsymbol{{x}}}}^{T}} &\quad {\boldsymbol{{x}}_{\boldsymbol{{d}}}}^{T} \\ \end{matrix} \right ]\left [ \begin{matrix}{\boldsymbol{{Q}}_{\textbf{2}}} & -{\boldsymbol{{Q}}_{\textbf{2}}} \\ \\[-8pt] -{\boldsymbol{{Q}}_{\textbf{2}}} &{\boldsymbol{{Q}}_{\textbf{2}}} \\ \end{matrix} \right ]\left [ \begin{matrix}{{{\boldsymbol{{x}}}}^{T}} \\ \\[-8pt] {\boldsymbol{{x}}_{\boldsymbol{{d}}}}^{T} \\ \end{matrix} \right ]+{{{\boldsymbol{{f}}}}^{T}}{\boldsymbol{{Rf}}} \right ]}d{\boldsymbol{\tau }} \\ &=\int _{t}^{\infty }{\left [{{{\boldsymbol{{y}}}}^{T}}{{{\boldsymbol{{Qy}}}}}+{{{\boldsymbol{{f}}}}^{T}}{\boldsymbol{{Rf}}} \right ]}d{\boldsymbol{\tau }}, \end{aligned} \end{equation}

where ${{{\boldsymbol{{Q}}}}}=\left [ \begin{array}{c@{\quad}c}{\boldsymbol{{Q}}_{\textbf{1}}} &{\textbf{0}} \\{\textbf{0}} &{{\boldsymbol{{Q}}}_{\boldsymbol{{2}}}^{\prime}} \end{array} \right ]$ with ${{\boldsymbol{{Q}}}_{\boldsymbol{{2}}}^{\prime}}=\left [ \begin{array}{c@{\quad}c}{\boldsymbol{{Q}}_{\textbf{2}}} & -{\boldsymbol{{Q}}_{\textbf{2}}}{{\boldsymbol{{G}}_{\textbf{1}}}} \\ -{\boldsymbol{{G}}_{\textbf{1}}}^{T} &{\boldsymbol{{G}}_{\textbf{1}}}^{T}{\boldsymbol{{Q}}_{\textbf{2}}}{\boldsymbol{{G}}_{\textbf{1}}} \end{array}\right ]$ .

For the convenience of calculation, a discrete output feedback method is introduced to solve adaptive dynamic programing problems. First, the continuous-time state-space function can be written as

(14) \begin{equation} \dot{{\boldsymbol{\xi }}}={{{\boldsymbol{{A}}}}\boldsymbol{\xi }}+{{\boldsymbol{{Bu}}}}, \qquad{{\boldsymbol{{y}}}}={{\boldsymbol{{C}}}}{\boldsymbol{\xi }} \end{equation}

where ${\boldsymbol{\xi }}\in \mathbb{R}^{n\times n}$ is the state, and ${\boldsymbol{{y}}}\in \mathbb{R}^p$ is the measured output, and ${\boldsymbol{{u}}}\in \mathbb{R}^m$ is the control input. In real application, $\boldsymbol{{u}}$ represents the interaction force $\boldsymbol{{f}}$ . Assume that $({{{\boldsymbol{{A}}}}},{{{\boldsymbol{{B}}}}})$ is controllable and $({{{\boldsymbol{{A}}}}},{\boldsymbol{{C}}})$ is observable. The matrices ${{{\boldsymbol{{A}}}}}\in \mathbb{R}^{n\times n}$ and ${{{\boldsymbol{{B}}}}}\in \mathbb{R}^{n\times m}$ include the unknown environment dynamics due to the unmeasurable state $\boldsymbol{{s}}_{\textbf{2}}$ , where

\begin{align*} & {{{\boldsymbol{{A}}}}}=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} -{{{{\boldsymbol{{M}}}}}^{-1}}({{\boldsymbol{{C}}_{\boldsymbol{{d}}}}}+{{\boldsymbol{{C}}_{\boldsymbol{{e}}}}}) & -{{{{{\boldsymbol{{M}}}}}}^{-1}}{{\boldsymbol{{G}}_{\boldsymbol{{e}}}}} & 0 & {{{{{\boldsymbol{{M}}}}}}^{-1}}{{\boldsymbol{{G}}_{\boldsymbol{{e}}}}}{{{\boldsymbol{{G}}_{\textbf{2}}}}} \\ \\[-8pt]1 & 0 & 0 & 0 \\ \\[-8pt]0 & 0 & {{\boldsymbol{{F}}_{\textbf{1}}}} & 0 \\ \\[-8pt]0 & 0 & 0 & {{\boldsymbol{{F}}_{\textbf{2}}}} \end{array} \right], \\[3pt] & \boldsymbol{{B}}={{\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} -{{{{{\boldsymbol{{M}}}}}}^{-1}} & 0 & 0 & 0 \end{array} \right ]}^{T}},\text { }{{{\boldsymbol{{M}}}}}\text {=}{{\boldsymbol{{M}}_{\boldsymbol{{d}}}}}+{{\boldsymbol{{M}}_{\boldsymbol{{e}}}}}. \end{align*}

The output matrix $\boldsymbol{{C}}$ is defined as

\begin{equation*} {{{\boldsymbol{{C}}}}}=\left [ \begin {array}{c@{\quad}c@{\quad}c@{\quad}c} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end {array} \right ]. \end{equation*}

Then, a discrete model of (14) using the zero-order hold method is obtained by taking periodic sampling

(15) \begin{equation} {\boldsymbol{\xi }}_{k+1}={\boldsymbol{{A}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k+{\boldsymbol{{B}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k, \qquad{\boldsymbol{{y}}}_k={\boldsymbol{{C}}}{\boldsymbol{\xi }}_k, \end{equation}

where ${\boldsymbol{{A}}_{\boldsymbol{{d}}}}=e^{\,{{{\boldsymbol{{A}}}}h}}$ , ${\boldsymbol{{B}}_{\boldsymbol{{d}}}}=(\!\int _{0}^{h}e^{\,{{{\boldsymbol{{A}}}}h}}\mathrm{d}{\boldsymbol{\tau }}){{{\boldsymbol{{B}}}}}$ , and $h\gt 0$ is a small sampling period. The stabilizing control policy is given as ${\boldsymbol{{u}}}_k={\boldsymbol{\mu }}({\boldsymbol{\xi }}_k)$ , which can minimize the performance index function:

(16) \begin{equation} {{{\boldsymbol{{V}}}}}^{{\boldsymbol{\mu }}}({\boldsymbol{\xi }}_k)=\sum _{i=k}^{\infty }({\boldsymbol{{y}}}_i^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}\boldsymbol{{y}}_i+{\boldsymbol{{u}}}_i^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_i) \end{equation}

with weighting matrices ${\boldsymbol{{Q}}_{\boldsymbol{{d}}}}\geq 0$ , ${\boldsymbol{{R}}_{\boldsymbol{{d}}}}\gt 0$ , and $({\boldsymbol{{A}}_{\boldsymbol{{d}}}},\sqrt{{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}}{\boldsymbol{{C}}})$ is observable and $({\boldsymbol{{A}}_{\boldsymbol{{d}}}},{\boldsymbol{{B}}_{\boldsymbol{{d}}}})$ is controllable.

Since the solution of (16) is similar to the LQR problem, we can find a positive definite matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ to rewrite (16) into the following form.

(17) \begin{equation} {{{\boldsymbol{{V}}}}}^{\mu }({\boldsymbol{\xi }}_k)={\boldsymbol{\xi }}_k^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k. \end{equation}

Then, the LQR Bellman equation is obtained as follows:

(18) \begin{equation} {\boldsymbol{\xi }}_k^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k={\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+{\boldsymbol{\xi }}_{k+1}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_{k+1}. \end{equation}

To find the optimal control, combining (15) and (18) yields

(19) \begin{equation} {\boldsymbol{\xi }}_k^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k={\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+({\boldsymbol{{A}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k+{\boldsymbol{{B}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k)^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}({\boldsymbol{{A}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k+{\boldsymbol{{B}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k). \end{equation}

To determine the minimizing control, setting the derivative with respect to ${\boldsymbol{{u}}}_k$ to zero can obtain

(20) \begin{equation} {\boldsymbol{{u}}}_k=-({\boldsymbol{{R}}_{\boldsymbol{{d}}}}+{\boldsymbol{{B}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{B}}_{\boldsymbol{{d}}}})^{-1}{\boldsymbol{{B}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{A}}_{\boldsymbol{{d}}}}{\boldsymbol{\xi }}_k=-{\boldsymbol{{K}}_{\boldsymbol{{d}}}}^*{\boldsymbol{\xi }}_k. \end{equation}

According to ref. [Reference Lewis and Vamvoudakis31], the Riccati equation can be derived as

(21) \begin{equation} {\boldsymbol{{A}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{A}}_{\boldsymbol{{d}}}}-{\boldsymbol{{P}}_{\boldsymbol{{d}}}}+{\boldsymbol{{C}}}^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{C}}}-{\boldsymbol{{A}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{B}}_{\boldsymbol{{d}}}}({\boldsymbol{{R}}_{\boldsymbol{{d}}}}+{\boldsymbol{{B}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{B}}_{\boldsymbol{{d}}}})^{-1}{\boldsymbol{{B}}_{\boldsymbol{{d}}}}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{{A}}_{\boldsymbol{{d}}}}={\textbf{0}}. \end{equation}

For the unmeasurable states, an output feedback method is introduced to parameterize the internal state in terms of the filtered inputs and outputs of the system.

Based on the discretized system model (15), the state ${\boldsymbol{\xi }}_k$ is reconstructed with the previous $N$ inputs and outputs.

(22) \begin{equation} {\boldsymbol{\xi }}_k={\boldsymbol{{E}}_{\boldsymbol{{u}}}}\overline{{\boldsymbol{{u}}}}_{k-1,k-N}+{\boldsymbol{{E}}_{\boldsymbol{{y}}}}\overline{{\boldsymbol{{y}}}}_{k-1,k-N}= \left[\begin{array}{@{}c@{\quad}c@{}}{\boldsymbol{{E}}_{\boldsymbol{{u}}}} &{\boldsymbol{{E}}_{\boldsymbol{{y}}}} \end{array}\right] \left[\begin{array}{c} \overline{{\boldsymbol{{u}}}}_{k-1,k-N}\\ \\[-8pt] \overline{{\boldsymbol{{y}}}}_{k-1,k-N} \end{array}\right] ={\boldsymbol{\Theta }}\overline{{\boldsymbol{{z}}}}_{k-1,k-N}, \end{equation}
(23) \begin{equation} \overline{{\boldsymbol{{u}}}}_{k-1,k-N}= \left[\begin{array}{c}{\boldsymbol{{u}}}_{k-1}\\ \\[-8pt]{ \boldsymbol{{u}}}_{k-2}\\ \\[-8pt]\vdots \\ \\[-8pt]{\boldsymbol{{u}}}_{k-N} \end{array}\right]\in \mathbb{R}^{mN}, \qquad \overline{{\boldsymbol{{y}}}}_{k-1,k-N}= \left[\begin{array}{c}{\boldsymbol{{y}}}_{k-1}\\ \\[-8pt]{\boldsymbol{{y}}}_{k-2}\\ \\[-8pt]\vdots \\ \\[-8pt]{\boldsymbol{{y}}}_{k-N} \end{array}\right]\in \mathbb{R}^{pN}, \end{equation}

where $\overline{{\boldsymbol{{u}}}}_{k-1,k-N}$ and $\overline{{\boldsymbol{{y}}}}_{k-1,k-N}$ are the measured input and output sequences over the time interval $[k-N,k-1]$ , respectively, and $\overline{{\boldsymbol{{z}}}}_{k-1,k-N}=[\overline{{\boldsymbol{{u}}}}_{k-1,k-N}^{\mathrm{T}},\overline{{\boldsymbol{{y}}}}_{k-1,k-N}^{\mathrm{T}}]\in \mathbb{R}^l$ , $l=N(m+p)$ , ${\boldsymbol{\Theta }}=[E_u,E_y]$ , and $\boldsymbol{{E}}_{\boldsymbol{{u}}}$ , $\boldsymbol{{E}}_{\boldsymbol{{y}}}$ are parameter matrices of two sequences, respectively, which is derived in ref. [Reference Lewis and Vamvoudakis31].

Substituting (22) into (17) yields

(24) \begin{equation} {{{\boldsymbol{{V}}}}}^{\mu }({\boldsymbol{\xi }}_k)=\overline{{\boldsymbol{{z}}}}_{k-1,k-N}^{\mathrm{T}} \left[\begin{array}{c}{\boldsymbol{{E}}_{\boldsymbol{{u}}}}^{\mathrm{T}}\\ \\[-8pt]{\boldsymbol{{E}}_{\boldsymbol{{y}}}}^{\mathrm{T}} \end{array}\right]{\boldsymbol{{P}}_{\boldsymbol{{d}}}} \!\left[\begin{array}{@{}c@{\quad}c@{}}{\boldsymbol{{E}}_{\boldsymbol{{u}}}} &{\boldsymbol{{E}}_{\boldsymbol{{y}}}} \end{array}\right] \overline{{\boldsymbol{{z}}}}_{k-1,k-N}\equiv \overline{{\boldsymbol{{z}}}}_{k-1,k-N}^{\mathrm{T}}\overline{{\boldsymbol{{P}}_{\boldsymbol{{d}}}}}\overline{{\boldsymbol{{z}}}}_{k-1,k-N}, \end{equation}

where ${\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}={\boldsymbol{\Theta }}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\Theta }}\in \mathbb{R}^{(m+p)N\times (m+p)N}$ . Then, the LQR Bellman equation is written in the form of measured data:

(25) \begin{equation} \overline{{\boldsymbol{{z}}}}_{k-1,k-N}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k-1,k-N}={\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}. \end{equation}

The optimal control can be determined online timely using temporal difference (TD) RL method to minimize the following Bellman TD error online [Reference Lewis and Vamvoudakis31]. Then, the Bellman TD error is defined based on Eq. (14), which can be rewritten as

(26) \begin{equation} {\boldsymbol{{e}}}_k=-{{\overline{\boldsymbol{{z}}}}}_{k-1,k-N}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k-1,k-N}+{\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}. \end{equation}

The optimal policy is obtained by minimizing value function in terms of the measured data.

(27) \begin{equation} {\boldsymbol{\mu }}({\boldsymbol{\xi }}_k)=arg\min _{\mu _k}\!(\,{\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{{\overline{\boldsymbol{{P}}}}_d}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}). \end{equation}

Decomposing ${{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{{\overline{\boldsymbol{{P}}}}_d}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}$ , we have

(28) \begin{equation} {{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}= \left[\begin{array}{c}{\boldsymbol{{u}}}_k\\ \\[-8pt]{{\overline{\boldsymbol{{u}}}}}_{k-1,k-N+1}\\ \\[-8pt]{{\overline{\boldsymbol{{y}}}}}_{k-1,k-N} \end{array}\right]^{\mathrm{T}} \left[\begin{array}{c@{\quad}c@{\quad}c}{\boldsymbol{{p}}}_0 &{\boldsymbol{{p}}}_u &{\boldsymbol{{p}}}_y \\ \\[-8pt]{\boldsymbol{{p}}}_u^{\mathrm{T}} &{\boldsymbol{{P}}}_{22} &{\boldsymbol{{P}}}_{23} \\ \\[-8pt]{\boldsymbol{{p}}}_y^{\mathrm{T}} &{\boldsymbol{{P}}}_{32} &{\boldsymbol{{P}}}_{33} \end{array}\right] \left[\begin{array}{c}{\boldsymbol{{u}}}_k\\ \\[-8pt]{{\overline{\boldsymbol{{u}}}}}_{k-1,k-N+1}\\ \\[-8pt]{{\overline{\boldsymbol{{y}}}}}_{k-1,k-N} \end{array}\right], \end{equation}

where ${\boldsymbol{{p}}}_0\in \mathbb{R}^{m\times m}$ , ${\boldsymbol{{p}}}_u\in \mathbb{R}^{m\times m(N-1)}$ , ${\boldsymbol{{p}}}_y\in \mathbb{R}^{m\times pn}$ . Then, differentiating with respect to ${\boldsymbol{{u}}}_k$ to perform minimization in (27) yields

(29) \begin{equation} {\boldsymbol{{u}}}_k=-({\boldsymbol{{R}}_{\boldsymbol{{d}}}}+{\boldsymbol{{p}}}_0)^{-1}(\,{\boldsymbol{{p}}}_u{{\overline{\boldsymbol{{u}}}}}_{k-1,k-N+1}+{\boldsymbol{{p}}}_y\,{{\overline{\boldsymbol{{y}}}}}_{k,k-N+1}). \end{equation}

Theorem 1. [Reference Liu, Ge, Zhao and Mei15]: Let $\delta ^j,(\,j=1,2,\ldots )$ be the maximum difference between ${{{\boldsymbol{{V}}}}}^j({\boldsymbol{\xi }}_k)$ and ${{{\boldsymbol{{V}}}}}^*({\boldsymbol{\xi }}_k)$ . When adding a discount factor $\gamma (0\leq \gamma \leq 1)$ to the VI algorithm, the convergence rate of the algorithm to the optimal value ${\boldsymbol{{P}}}^*$ is

(30) \begin{equation} \delta ^{j+1}\leq \gamma \!\delta ^j. \end{equation}

Considering the convergence speed in the iteration process of control law, a discount factor is added to the controller. Then LQR Bellman equation with discount factor is expressed as

(31) \begin{equation} {{\overline{\boldsymbol{{z}}}}}_{k-1,k-N}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k-1,k-N}={\boldsymbol{{y}}}_k^{\mathrm{T}}{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}{\boldsymbol{{y}}}_k+{\boldsymbol{{u}}}_k^{\mathrm{T}}{\boldsymbol{{R}}_{\boldsymbol{{d}}}}{\boldsymbol{{u}}}_k+ \gamma{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}. \end{equation}

The optimal control law with discount factor is written as follows:

(32) \begin{equation} {\boldsymbol{{u}}}_k^*=-({\boldsymbol{{R}}_{\boldsymbol{{d}}}}/\gamma +{\boldsymbol{{p}}}_0)^{-1}(\,{\boldsymbol{{p}}}_u{{\overline{\boldsymbol{{u}}}}}_{k-1,k-N+1}+{\boldsymbol{{p}}}_y{\,{\overline{\boldsymbol{{y}}}}}_{k,k-N+1}). \end{equation}

Thus, the optimal interaction force is obtained as ${\boldsymbol{{f}}}={\boldsymbol{{u}}}_k^{*}$ .

3.3. ABLF-based NN inner loop controller

Referring to (1), and taking ${\boldsymbol{{x}}}_1={\boldsymbol{{q}}}$ , ${\boldsymbol{{x}}}_2=\dot{{\boldsymbol{{q}}}}$ , the robot dynamic system can be rewritten as

(33) \begin{align} &\dot{{\boldsymbol{{x}}}}_1={\boldsymbol{{x}}}_2, \nonumber\\[3pt]&\dot{{\boldsymbol{{x}}}}_2={\boldsymbol{{M}}}^{-1}({\boldsymbol{{x}}}_1)[{\boldsymbol{\tau }}+{\boldsymbol{{J}}}^{\mathrm{T}}({\boldsymbol{{x}}}_1){\boldsymbol{{f}}}-{\boldsymbol{{C}}}({\boldsymbol{{x}}}_1,{\boldsymbol{{x}}}_2){\boldsymbol{{x}}}_2-{\boldsymbol{{G}}}({\boldsymbol{{x}}}_1)],\nonumber\\[3pt] &{\boldsymbol{{y}}}={\boldsymbol{{x}}}_1,\end{align}

where ${\boldsymbol{{x}}}_1=[x_{11},x_{12},\cdots,x_{1n}]^{\mathrm{T}}$ , and the desired trajectory is ${\boldsymbol{{x}}_{\boldsymbol{{d}}}}(t)=[q_{d1}(t),q_{d2}(t),\cdots,q_{dn}(t)]^{\mathrm{T}}$ . We need to ensure all states and outputs are within the limits when tracking the desired trajectory, $i.e.$ , $|{\boldsymbol{{x}}}_{1i}|\lt k_{c1i}$ , $|x_{2i}|\lt k_{c2i}$ , $\forall{t}\geq 0$ with $k_{c1i}$ , $k_{c2i}$ being positive constants, where ${\boldsymbol{{k}}_{c1}}=[k_{c11},k_{c12},\cdots,k_{c1n}]^{\mathrm{T}}$ , ${\boldsymbol{{k}}_{c2}}=[k_{c21},k_{c22},\cdots,k_{c2n}]^{\mathrm{T}}$ are positive constant vectors.

Lemma 1: There exists any positive constant vector ${\boldsymbol{{k}}_b}\in \mathbb{R}$ , for $\forall{{\boldsymbol{{x}}}}\in \mathbb{R}$ in the interval $|{\boldsymbol{{x}}}|\lt |{\boldsymbol{{k}}_{\boldsymbol{{b}}}}|$ , then the following inequality holds

(34) \begin{equation} \ln \frac{{\boldsymbol{{k}}_{\boldsymbol{{b}}}}^{2p}}{{\boldsymbol{{k}}_{\boldsymbol{{b}}}}^{2p}-{\boldsymbol{{x}}}^{2p}}\leq \frac{{\boldsymbol{{x}}}^{2p}}{{\boldsymbol{{k}}_{\boldsymbol{{b}}}^{2p}}-{\boldsymbol{{x}}}^{2p}}, \end{equation}

where $p$ is a positive integer.

For the convenience of expression, the controller proposed in the inner loop is expressed in a continuous way. In the real experiment, the torque calculated from the asymmetric constrained controller is exerted to the robot in a discrete way. So it is corresponding to the discrete outer loop.

In order to achieve the requirements of tracking constraint, an ABLF-based controller is employed in the system. The tracking errors are designed as ${\boldsymbol{{z}}}_1=[z_{11},z_{12},\cdots,z_{1n}]^{\mathrm{T}}={\boldsymbol{{x}}}_1-{\boldsymbol{{x}}_{\boldsymbol{{d}}}}$ , ${\boldsymbol{{z}}}_2=[z_{21},z_{22},\cdots,z_{2n}]^{\mathrm{T}}={\boldsymbol{{x}}}_2-{\boldsymbol{\alpha }}$ , and we have $\dot{{\boldsymbol{{z}}}}_1={\boldsymbol{{x}}}_2-{\dot{\boldsymbol{{x}}}_{\boldsymbol{{d}}}}$ . The virtual control $\boldsymbol{\alpha }$ is defined as

(35) \begin{equation} {\boldsymbol{\alpha }}={{\dot{\boldsymbol{{x}}}_{\boldsymbol{{d}}}}}-{\boldsymbol{{T}}}, \end{equation}

where

(36) \begin{equation} \boldsymbol{{T}}= \left[\begin{array}{c} (k_{a1}+z_{11})(k_{b1}-z_{11})k_1z_{11}\\ \\[-8pt](k_{a2}+z_{12})(k_{b2}-z_{12})k_2z_{12}\\ \\[-8pt]\vdots \\ \\[-8pt](k_{an}+z_{1n})(k_{bn}-z_{1n})k_nz_{1n} \end{array}\right]. \end{equation}

Positive constants ${{\boldsymbol{{k}}}_{\boldsymbol{{a}}}}$ and ${{\boldsymbol{{k}}}_{\boldsymbol{{b}}}}$ denote the lower and upper bounds of tracking error where $-{\boldsymbol{{k}}_{\boldsymbol{{a}}}}\lt{{{\boldsymbol{{z}}}_{\textbf{1,2}}}}\lt{\boldsymbol{{k}}_{\boldsymbol{{b}}}}$ , and $k_i$ $\!(i=1,2,\ldots,n)$ denotes a positive constant. Taking the derivation of $\boldsymbol{{z}}_{\textbf{1}}$ and $\boldsymbol{{z}}_{\textbf{2}}$ with respect to time, we have

(37) \begin{equation} {{\dot{\boldsymbol{{z}}}_{\textbf{1}}}}={\boldsymbol{{z}}_{\textbf{2}}}+{\boldsymbol{\alpha }}-{{\dot{\boldsymbol{{x}}}_{\boldsymbol{{d}}}}}={\boldsymbol{{z}}_{\textbf{2}}}-{\boldsymbol{{T}}}, \end{equation}
(38) \begin{equation} {\dot{\boldsymbol{{z}}}_{\textbf{2}}}={\boldsymbol{{M}}}^{-1}({\boldsymbol{{x}}_{\textbf{1}}})[{\boldsymbol{\tau }}+{\boldsymbol{{J}}}^{\mathrm{T}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{{f}}}-{\boldsymbol{{C}}}({\boldsymbol{{x}}_{\textbf{1}}},{\boldsymbol{{x}}_{\textbf{2}}}){\boldsymbol{{x}}_{\textbf{2}}}-{\boldsymbol{{G}}}({\boldsymbol{{x}}_{\textbf{1}}})]-{\boldsymbol{\dot{\alpha }}}. \end{equation}

The ABLF is defined as follows:

(39) \begin{equation} {\boldsymbol{{V}}_{\textbf{1}}}=\frac{h_1({\boldsymbol{{z}}_{\textbf{1}}})}{2p}\sum _{i=1}^n\log \frac{k_{bi}^{2p}}{k_{bi}^{2p}-z_{1i}^{2p}}+\frac{1-h_1({\boldsymbol{{z}}_{\textbf{1}}})}{2p}\sum _{i=1}^n\log \frac{k_{ai}^{2p}}{k_{ai}^{2p}-z_{1i}^{2p}}, \end{equation}

where $k_{ai}=x_{di}-\underline{k}_{ci}$ , $k_{bi}=\overline{k}_{ci}-x_{di}$ , $\underline{k}_{ai}\leq k_{ai}\leq \overline{k}_{ai}$ , $\underline{k}_{bi}\leq k_{bi}\leq \overline{k}_{bi}$ , and $h_1\!(z_1)$ is given as

(40) \begin{equation} h_1\!({\boldsymbol{{z}}_{\textbf{1}}})= \begin{cases} 1&{z_i\gt 0}\\ 0&{z_i\leq 0} \end{cases}. \end{equation}

The time derivative of (39) is

(41) \begin{align} {\dot{\boldsymbol{{V}}}_{\textbf{1}}}&=h_1({\boldsymbol{{z}}_{\textbf{1}}})\sum _{i=1}^n\frac{z_{1i}^{2p-1}\dot{z}_{1i}}{k_{bi}^{2p}-z_{1i}^{2p}}+(1-h_1({\boldsymbol{{z}}_{\textbf{1}}}))\sum _{i=1}^n\frac{z_{1i}^{2p-1}\dot{z}_{1i}}{k_{ai}^{2p}-z_{1i}^{2p}} \nonumber\\ &=-h_1({\boldsymbol{{z}}_{\textbf{1}}})\sum _{i=1}^nk_1z_{1i}^{2p}+h_1({\boldsymbol{{z}}_{\textbf{1}}})\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{bi}^{2p}-z_{1i}^{2p}}-(1-h_1({\boldsymbol{{z}}_{\textbf{1}}}))\sum _{i=1}^nk_iz_{1i}^{2p}+(1-h_1({\boldsymbol{{z}}_{\textbf{1}}}))\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{ai}^{2p}-z_{1i}^{2p}} \nonumber\\ &=-\sum _{i-1}^nk_iz_{1i}^{2p}+h_1({\boldsymbol{{z}}_{\textbf{1}}})\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{bi}^{2p}-z_{1i}^{2p}}+(1-h_1({\boldsymbol{{z}}_{\textbf{1}}}))\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{ai}^{2p}-z_{1i}^{2p}}. \end{align}

Then, a second Lyapunov function is designed as

(42) \begin{equation} {\boldsymbol{{V}}_{\textbf{2}}}={\boldsymbol{{V}}_{\textbf{1}} }+\frac{1}{2}{\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}{\boldsymbol{{M}}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{{z}}_{\textbf{2}}}. \end{equation}

Take the time derivation of $\boldsymbol{{V}}_{\textbf{2}}$ to obtain

(43) \begin{equation} {\dot{\boldsymbol{{V}}}_{\textbf{2}}}={\dot{\boldsymbol{{V}}}_{\textbf{1}} }+{\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}[{\boldsymbol{\tau }}+{\boldsymbol{{J}}}^{\mathrm{T}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{{f}}}-{\boldsymbol{{C}}}({\boldsymbol{{x}}_{\textbf{1}}},{\boldsymbol{{x}}_{\textbf{2}}}){\boldsymbol{\alpha }}+{\boldsymbol{{G}}}({\boldsymbol{{x}}_{\textbf{1}}})-{\boldsymbol{{M}}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{\dot \alpha }}]. \end{equation}

When ${\boldsymbol{{z}}_{\textbf{2}}}=[0,0,\ldots,0]^{\mathrm{T}}$ , ${\dot{\boldsymbol{{V}}}_{\textbf{2}}}=-\sum _{i=1}^nk_iz_{1i}^{2p}\lt 0$ . According to Lyapunov stability theory, the system is asymptotically stable. On the contrary, according to the property of the Moore-Penrose pseudoinverse, if ${\boldsymbol{{z}}_{\textbf{2}}}\neq [0,0,\ldots,0]^{\mathrm{T}}$ , ${\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}({\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}})^+=1$ . $\boldsymbol{{K}}_{\textbf{2}}$ is a designed positive gain matrix with $\boldsymbol{{K}}_{\textbf{2}}={\boldsymbol{{K}}_{\textbf{2}}}^{\mathrm{T}}\gt 0$ . Then the model-based controller is designed as

(44) \begin{align}{\boldsymbol{\tau}_{0}}& =-{\boldsymbol{{J}}}^{\mathrm{T}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{{f}}}+{\boldsymbol{{C}}}({\boldsymbol{{x}}_{\textbf{1}}},{\boldsymbol{{x}}_{\textbf{2}}}){\boldsymbol{\alpha }}+{\boldsymbol{{G}}}({\boldsymbol{{x}}_{\textbf{1}}})+{\boldsymbol{{M}}}({\boldsymbol{{x}}_{\textbf{1}}}){\boldsymbol{\dot \alpha }}-{\boldsymbol{{K}}_{\textbf{2}}}{\boldsymbol{{z}}_{\textbf{2}}} \nonumber\\[3pt]&\quad -h_1({\boldsymbol{{z}}_{\textbf{1}}})({\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}})^+\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{bi}^{2p}-z_{1i}^{2p}}-(1-h_1({\boldsymbol{{z}}_{\textbf{1}}}))({\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}})^+\sum _{i=1}^n\frac{z_{1i}^{2p-1}z_{2i}}{k_{ai}^{2p}-z{1i}^{2p}}. \end{align}

Substituting (44) into (43) yields

(45) \begin{equation} {\dot{\boldsymbol{{V}}}_{\textbf{2}}}=-\sum _{i=1}^nk_iz_{1i}^{2p}-{\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}{\boldsymbol{{K}}_{\textbf{2}}}{\boldsymbol{{z}}_{\textbf{2}}}=-({\boldsymbol{{z}}_{\textbf{1}}}^{\mathrm{T}})^p{\boldsymbol{{K}}_{\textbf{1}}}{\boldsymbol{{z}}_{\textbf{1}}}^p-{\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}{\boldsymbol{{K}}_{\textbf{2}}}{\boldsymbol{{z}}_{\textbf{2}}}\lt 0. \end{equation}

where ${\boldsymbol{{K}}_{\textbf{2}}}={\boldsymbol{{K}}_{\textbf{2}}}^{\mathrm{T}}=diag([k_1,k_2,\ldots,k_n])\gt 0$ is a diagonal matrix.

Eq.(44) is model-based, in real applications, the dynamic parameters ${\boldsymbol{{C}}}({\boldsymbol{{x}}_{\textbf{1}}},{\boldsymbol{{x}}_{\textbf{2}}})$ , ${\boldsymbol{{G}}}({\boldsymbol{{x}}_{\textbf{1}}})$ , ${\boldsymbol{{M}}}({\boldsymbol{{x}}_{\textbf{1}}})$ , and $\boldsymbol{{f}}$ are difficult to be obtained. To address uncertainties in the dynamic model, an adaptive NN method is employed to fit the uncertain terms. The adaptive NN control is proposed as

(46) \begin{align}{{{\boldsymbol{\tau }}}_{1}}&=-{h_1({\boldsymbol{{z}}_{\textbf{1}}}){({\boldsymbol{{z}}_{\textbf{2}}}^{T})}^{+}}\sum \limits _{i=1}^{n}{\frac{{{k}_{i}}z_{1i}^{2p}}{k_{bi}^{2p}-z_{1i}^{2p}}}-{(1-h_1({\boldsymbol{{z}}_{\textbf{1}}})){({\boldsymbol{{z}}_{\textbf{2}}}^{T})}^{+}}\sum \limits _{i=1}^{n}{\frac{{{k}_{i}}z_{1i}^{2p}}{k_{ai}^{2p}-z_{1i}^{2p}}}-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}}){{({\boldsymbol{{z}}_{\textbf{2}}}^{T})}^{+}}\sum \limits _{i=1}^{n}{\frac{z_{_{1i}}^{2p-1}{{z}_{2i}}}{k_{bi}^{2p}-z_{1i}^{2p}}} \nonumber\\[3pt]&\quad -(1-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}})){{({\boldsymbol{{z}}_{\textbf{2}}}^{T})}^{+}}\sum \limits _{i=1}^{n}{\frac{z_{1i}^{2p-1}{{z}_{2i}}}{k_{ai}^{2p}-z_{1i}^{2p}}}-{{\boldsymbol{{K}}_{\textbf{{2}}}}}{{\boldsymbol{{z}}_{\textbf{2}}}}-{{{\hat{{\boldsymbol{{W}}}}}}^{T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})-{{{\boldsymbol{{J}}}}^{T}}({{\boldsymbol{{x}}_{\textbf{1}}}}){\boldsymbol{{f}}}. \end{align}

where ${\boldsymbol{{S}}}({\cdot})$ is the basis function, $\hat{{\boldsymbol{{W}}}}\in{{R}^{l\times n}}$ is the estimation weight in which $l$ is the node number of NN, and ${\boldsymbol{{Z}}}={{\left [{\boldsymbol{{x}}_{\textbf{1}}}^{T},{\boldsymbol{{x}}_{\textbf{2}}}^{T},{{{\boldsymbol{\alpha }}}^{T}},{{{\dot{{\boldsymbol{\alpha }}}}}^{T}} \right ]}^{T}}$ is the input of the NN basis function. With the property of approximating any nonlinearity, ${{\hat{{\boldsymbol{{W}}}}}^{T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ is the estimation of ${{{\boldsymbol{{W}}}}^{*T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ , while the ${{{\boldsymbol{{W}}}}^{*T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ represents the real value of unknown term in model-based control (45) with

(47) \begin{equation} {{{\boldsymbol{{W}}}}^{*T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})+{{\varepsilon }} =-{\boldsymbol{{C}}}({{\boldsymbol{{x}}_{\textbf{1}}}},{{\boldsymbol{{x}}_{\textbf{2}}}}){\boldsymbol{\alpha }} -{\boldsymbol{{G}}}({{\boldsymbol{{x}}_{\textbf{1}}}})-{\boldsymbol{{M}}}({{\boldsymbol{{x}}_{\textbf{1}}}})\dot{{\boldsymbol{\alpha }}}. \end{equation}

The adaptive law is designed as

(48) \begin{equation} {{\dot{\hat{{\boldsymbol{{W}}}}}}_{i}}={{{\boldsymbol{\Gamma }}}_{i}}\left [{{\boldsymbol{{S}}}_{i}}({\boldsymbol{{Z}}}){{z}_{2i}}-{{\boldsymbol{{e}}_{i}}}\left |{{z}_{2i}} \right |{{{\dot{\hat{{\boldsymbol{{W}}}}}}}_{i}} \right ]. \end{equation}

Then, the BLF $V_3$ is designed as

(49) \begin{equation} {{\boldsymbol{{V}}_{\textbf{3}}}}={{\boldsymbol{{V}}_{\textbf{3}}}}+\frac{1}{2}\sum \limits _{i=1}^{n}{\tilde{{\boldsymbol{{W}}}}_{i}^{T}{\boldsymbol{\Gamma }} _{i}^{-1}{{{\tilde{{\boldsymbol{{W}}}}}}_{i}}}, \end{equation}

where ${{\tilde{{\boldsymbol{{W}}}}}_{i}}={{\hat{{\boldsymbol{{W}}}}}_{i}}-{\boldsymbol{{W}}}_{i}^{*}$ and ${\tilde{{\boldsymbol{{W}}}}}_{i}$ , ${\hat{{\boldsymbol{{W}}}}}_{i}$ , ${\boldsymbol{{W}}}_{i}^{*}$ are the NN weight error, approximation, and ideal value, respectively. Differentiating (49), we have

(50) \begin{align} {\dot{\boldsymbol{{V}}}_{\textbf{3}}}&=-\sum _{i=1}^{n}{{{k}_{i}}z_{1i}^{2p}}+{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}})\sum _{i=1}^{n}{\frac{z_{1i}^{2p-1}{{z}_{2i}}}{k_{bi}^{2p}-z_{1i}^{2p}}}+(1-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}}))\sum _{i=1}^{n}{\frac{z_{1i}^{2p-1}{{z}_{2i}}}{k_{ai}^{2p}-z_{1i}^{2p}}} \nonumber\\[3pt]&\quad +\sum _{i=1}^{n}{\tilde{{\boldsymbol{{W}}}}_{i}^{T}{\boldsymbol{\Gamma }} _{i}^{-1}{{{\dot{\hat{{\boldsymbol{{W}}}}}}}_{i}}}+{\boldsymbol{{z}}}_{2}^{T}[{{{\boldsymbol{\tau }}}_{1}}+{{{\boldsymbol{{J}}}}^{T}}({{\boldsymbol{{x}}_{\textbf{1}}}}){\boldsymbol{{f}}}-{\boldsymbol{{C}}}({{\boldsymbol{{x}}_{\textbf{1}}}},{{\boldsymbol{{x}}_{\textbf{2}}}}){\boldsymbol{\alpha }} -{\boldsymbol{{G}}}({{\boldsymbol{{x}}_{\textbf{1}}}})-{\boldsymbol{{M}}}({{\boldsymbol{{x}}_{\textbf{1}}}})\dot{{\boldsymbol{\alpha }}}. \end{align}

If ${{\boldsymbol{{z}}_{\textbf{2}}}}\ne{{\left [ 0,0,\ldots,0 \right ]}^{T}}$ , according to (46)–(48), we have

(51) \begin{equation} \begin{aligned}{\dot{\boldsymbol{{V}}}_{\textbf{3}}}&\leq -\sum _{i=1}^{n}{{{k}_{i}}z_{1i}^{2p}}-{\boldsymbol{{z}}}_{2}^{T}{{\boldsymbol{{K}}_{\textbf{2}}}}{{\boldsymbol{{z}}_{\textbf{2}}}}-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}})\sum _{i=1}^{n}{\frac{{{k}_{i}}z_{1i}^{2p}}{k_{bi}^{2p}-z_{1i}^{2p}}}-(1-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}}))\sum _{i=1}^{n}{\frac{{{k}_{i}}z_{1i}^{2p}}{k_{ai}^{2p}-z_{1i}^{2p}}}\\ &+\sum _{i=1}^{n}{\left[\tilde{{\boldsymbol{{W}}}}_{i}^{T}{{{\boldsymbol{{S}}}}_{i}}({\boldsymbol{{Z}}}){{z}_{2i}}-{{e}_{i}}|{{z}_{2i}} |\tilde{{\boldsymbol{{W}}}}_{i}^{T}{{{\hat{{\boldsymbol{{W}}}}}}_{i}} \right]}+{\boldsymbol{{z}}_{\textbf{2}}}^{T}{{\varepsilon }} ({\boldsymbol{{Z}}})+{\boldsymbol{{z}}_{\textbf{2}}}^{T}[\!-\!{{{\hat{{\boldsymbol{{W}}}}}}^{T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})+{{{\boldsymbol{{W}}}}^{*}}^{T}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})\\ & \leq -{\boldsymbol{{z}}_{\textbf{2}}}^{T}\left({{\boldsymbol{{K}}_{\textbf{2}}}}-\frac{3}{4}{{{\boldsymbol{{I}}}}}\right){{\boldsymbol{{z}}_{\textbf{2}}}}-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}})\sum _{i=1}^{n}{{{k}_{i}}\ln \frac{k_{bi}^{2p}}{k_{bi}^{2p}-z_{1i}^{2p}}-(1-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}}))}\sum _{i=1}^{n}{{{k}_{i}}\ln \frac{k_{ai}^{2p}}{k_{ai}^{2p}-z_{1i}^{2p}}}\\ &+\frac{1}{2}{{\|{{\varepsilon }} ({\boldsymbol{{Z}}})\|}^{2}}+\sum _{i=1}^{n}{\frac{{\boldsymbol{{e}}}_{i}^{2}}{4}}({{\|{\boldsymbol{{W}}}_{i}^{*}\|}^{4}}+{{\|{{{\tilde{{\boldsymbol{{W}}}}}}_{i}} \|}^{4}}-2{{\|{\boldsymbol{{W}}}_{i}^{*}\|}^{2}}{{\|{{{\tilde{{\boldsymbol{{W}}}}}}_{i}} \|}^{2}}) \leq -\rho{{\boldsymbol{{V}}_{\textbf{3}}}}+{\boldsymbol{{C}}}, \end{aligned} \end{equation}

where

(52) \begin{equation} \rho =\min\!\left ( \min\!(2{{k}_{i}}),\min \!(2(1-{{h}_{1}}({{\boldsymbol{{z}}_{\textbf{1}}}})){{k}_{i}}),\frac{2{{\lambda }_{\min }}({{\boldsymbol{{K}}_{\textbf{2}}}}-\tfrac{3}{4}{{{\boldsymbol{{I}}}}})}{{{\lambda }_{\max }}({\boldsymbol{{M}}})},\min \!\left ( \frac{{\boldsymbol{{z}}}_{i}^{2}{{\left \|{\boldsymbol{{W}}}_{i}^{*} \right \|}^{2}}}{{{\lambda }_{\max }}({\boldsymbol{\Gamma }} _{i}^{-1})} \right ) \right ),\end{equation}
(53) \begin{equation} {\boldsymbol{{C}}}=\frac{1}{2}{{\left \|{{\varepsilon }} ({\boldsymbol{{Z}}}) \right \|}^{2}}+\sum \limits _{i=1}^{n}{\left ( \frac{{\boldsymbol{{z}}}_{i}^{2}}{4}{{\left \|{\boldsymbol{{W}}}_{i}^{*} \right \|}^{4}}+\frac{{\boldsymbol{{z}}}_{i}^{2}}{4}{{N}^{4}} \right )}. \end{equation}

Here ${{\lambda }_{\min }}({\bullet})$ and ${{\lambda }_{\max }}({\bullet})$ denote the minimum and maximum eigenvalues of the matrix, respectively, and

(54) \begin{equation} \min \!(2{{k}_{i}})\gt 0,\text{ }{{\lambda }_{\min }}\!\left({{\boldsymbol{{K}}_{\textbf{2}}}}-\frac{3}{4}{{{\boldsymbol{{I}}}}}\right)\gt 0, \text{ }N=\left (\frac{\lambda _{\max }({\boldsymbol{\Gamma }}_i)}{\lambda _{\min }({\boldsymbol{\Gamma }}_i)}\right )^{\frac{1}{2}}\frac{{\left \|{\boldsymbol{{s}}}_i \right \|}}{{\boldsymbol{{z}}}_i}+\left \|{\boldsymbol{{W}}}_i^* \right \| \end{equation}

with $\left \|{\boldsymbol{{S}}}_i({\boldsymbol{{Z}}}) \right \| \le \left \|{\boldsymbol{{s}}}_i \right \|$ and ${\boldsymbol{{s}}}_i\gt 0$ .

Theorem 2 Consider the dynamic system described in (1). If the initial conditions satisfy $\left |{{x}_{1i}}(0) \right |\lt{{k}_{c1i}}$ , $\left |{{x}_{2i}}(0) \right |\lt{{k}_{c2i}}$ , the control law (46) ensures that all error signals are semi-globally uniformly bounded. Then the position and velocity constraints are not violated, that is, $\forall t\gt 0$ , $\text{ }\left |{{x}_{1i}}(t) \right |\le{{k}_{c1i}}$ , $\text{ }\left |{{x}_{2i}}(t) \right |\le{{k}_{c2i}}$ . The closed-loop error signals $\boldsymbol{{z}}_1$ , $\boldsymbol{{z}}_2$ and $\tilde{{\boldsymbol{{W}}}}$ remain in compact sets ${\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{1}}}}$ , ${\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{2}}}}$ , ${\boldsymbol{\Omega }}_{{\boldsymbol{{w}}}}$ , respectively.

(55) \begin{align} &{{\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{1}}}}}:=\left \{{{\boldsymbol{{z}}_{\textbf{1}}}}\in{{R}^{n}}|-\sqrt [2p]{k_{ai}^{2p}\left ( 1-{{e}^{-{\boldsymbol{{D}}}}} \right )}\le{{z}_{1i}} \le \sqrt [2p]{k_{bi}^{2p}\left ( 1-{{e}^{-{\boldsymbol{{D}}}}} \right )},i=1,\ldots,n \right \} \nonumber \\[3pt] &{{\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{2}}}}}:=\left \{{{\boldsymbol{{z}}_{\textbf{2}}}}\in{{R}^{n}}|\left \|{{\boldsymbol{{z}}_{\textbf{2}}}} \right \|\le \sqrt{\frac{{\boldsymbol{{D}}}}{{{\lambda }_{\min }}({\boldsymbol{{M}})}}} \right \}, \text{ }{{\boldsymbol{\Omega }}_{\boldsymbol{{w}}}}:=\left \{ \tilde{{\boldsymbol{{W}}}}\in{{R}^{l\times n}}|\left \|{\tilde{{\boldsymbol{{W}}}}} \right \|\le \sqrt{\frac{{\boldsymbol{{D}}}}{{{\lambda }_{\min }}({{\boldsymbol{\Gamma }}^{-1}})}} \right \}, \end{align}

where ${\boldsymbol{{D}}}=2\left ({\boldsymbol{{V}}_{\textbf{3}}}(0)+{\boldsymbol{{C}}}/\rho \right )$ .

Proof: Multiplying $e^{\rho t}$ on both sides of (51), we have

(56) \begin{equation} e^{\rho t}\dot{{{{\boldsymbol{{V}}}}}}_3\leq -\rho{{{\boldsymbol{{V}}}}}_3e^{\rho t}+{\boldsymbol{{C}}}e^{\rho t}, \end{equation}

and

(57) \begin{equation} d(e^{\rho t}{{{\boldsymbol{{V}}}}_3})/dt\leq{\boldsymbol{{C}}}e^{\rho t} \end{equation}

Then, we have

(58) \begin{equation} {{{\boldsymbol{{V}}}}_3}\leq \left({{{\boldsymbol{{V}}}}_3}(0)-\frac{{\boldsymbol{{C}}}}{\rho }\right)e^{-\rho t}+\frac{{\boldsymbol{{C}}}}{\rho }\leq{{{\boldsymbol{{V}}}}_3}(0)+\frac{{\boldsymbol{{C}}}}{\rho }. \end{equation}

According to (58) and the property of inequality in ref. [Reference Zhang, Dong, Ouyang, Yin and Peng32], we have

(59) \begin{align} -\frac{1}{2}\sum _{i=1}^n\ln{\frac{k_{ai}^{2p}}{k_{ai}^{2p}-z_{1i}^{2p}}}\leq &{{{\boldsymbol{{V}}}}_3}(0)+{\boldsymbol{{C}}}/\rho \leq \frac{1}{2}\sum _{i=1}^n\ln{\frac{k_{bi}^{2p}}{k_{bi}^{2p}-z_{1i}^{2p}}} \nonumber\\[3pt]\frac{1}{2}\|{\boldsymbol{{z}}_{\textbf{2}}}\|^2&\leq \frac{{{{\boldsymbol{{V}}}}_3}(0)+C/\rho }{\lambda _{min}({\boldsymbol{{M}}})} \nonumber\\[3pt] \frac{1}{2}\|\tilde{{\boldsymbol{{W}}}}\|^2&\leq \frac{{{{\boldsymbol{{V}}}}_3}(0)+{\boldsymbol{{C}}}/\rho }{\lambda _{min}({\boldsymbol{\Gamma }}^{-1})} \end{align}

4. Simulations

In the simulations, a two-link robotic manipulator simulation platform is constructed as shown in Fig. 2 and the parameters of the robot are defined in Table I. The simulations of inner loop and outer loop are exerted separately to validate their own effectiveness.

Table I. Parameters of the 2-DOF robot manipulator.

Figure 2. Diagram of the 2-DOF robot system.

4.1. Inner loop

In order to validate the effectiveness and feasibility of the designed asymmetric constrained controller. Simulations using the proposed controller, PD controller, and controller in ref. [Reference Zheng and Fang33] are carried in MATLAB.

The initial positions of the robot are given as ${{\boldsymbol{{q}}_{\textbf{1}}}}(0)=[0.4,-0.1](rad)$ , ${{\boldsymbol{{q}}_{\textbf{2}}}}(0)=[0,0](rad/s)$ . The interaction force with end-effector is ${\boldsymbol{{f}}}(t)=[\!\sin\!(t)+1,\cos\!(t)+0.5](N)$ , where $t\in [0,{{t}_{f}}]$ and $t_f=20s$ . The desired trajectory is set as ${{{\boldsymbol{{q}}}_{\boldsymbol{{r}}}}}=[0.1\sin\!(t)+0.5\cos\!(t),0.1\cos\!(t)+0.5\sin\!(t)](rad)$ . The limits of tracking errors are ${{{\boldsymbol{{k}}_{\boldsymbol{{a}}}}}}=[\!-\!1.0,-1.0](rad)$ , ${{{{\boldsymbol{{k}}}_{\boldsymbol{{b}}}}}}=[1.2,1.2](rad)$ , and the physical limit of robot joints is ${{{\boldsymbol{{k}}}_{\boldsymbol{{c}}}}}=[1.8,1.8](rad)$ .

Figure 3. Tracking performance of different controllers of Joints 1 and 2.

Figure 4. Tracking error of different controllers of Joints 1 and 2.

For PD control, the controller is designed as ${\boldsymbol{\tau }} =-{{\boldsymbol{{G}}_{\textbf{1}}}}({\boldsymbol{{q}}_{\textbf{1}}}-{{\boldsymbol{{q}}}_{\boldsymbol{{r}}}})-{\boldsymbol{{G}}_{\textbf{2}}}({\boldsymbol{{q}}_{\textbf{2}}}-{\dot{\boldsymbol{{q}}}_{\textbf{r}}})$ where ${\boldsymbol{{G}}_{\textbf{1}}}=diag(50,30)$ and ${\boldsymbol{{G}}_{\textbf{2}}}=diag(30,3)$ .

For the proposed controller, Eq. (46) is the control law applied in the inner loop. ${\boldsymbol{{S}}}({\cdot})$ is defined as a Gaussian function. As for the gain of the NN, $\sigma$ is chosen as 0.2, the number of the NN nodes is $2^8$ , and the centers are selected in the area of $[\!-\!1,1]\times 5$ . The variance of centers is chosen as 0.5, and the initial value of the NN weight matrix is defined as 0. The control parameters are set as ${\boldsymbol{{K}}_{\textbf{2}}}=diag(3,3)$ , ${\boldsymbol{{k}}_{\textbf{1}}}={\boldsymbol{{k}}_{\textbf{2}}}=50$ , ${\boldsymbol{{\Gamma }}_{\textbf{1}}}={\boldsymbol{{\Gamma }}_{\textbf{2}}}=100{{{{\boldsymbol{{I}}}}}_{256\times 256}}$ after regulating.

The tracking results are shown in Figs. 3 and 4. From Fig. 4, the tracking errors of PD controller remain relatively large, and the tracking errors of the controller in ref. [Reference Zheng and Fang33] are smaller. It is seen that the tracking errors are almost close to zero using our proposed method, which is better than the other methods. The comparison results show that our proposed method improves the tracking performance compared with PD controller and the controller in ref. [Reference Zheng and Fang33]. The control input of the three methods is shown in Fig. 5. The torque of the proposed controller is relatively stable.

Figure 5. Control input of three control methods.

4.2. Outer loop

In the outer loop, a robot manipulator with two revolute joints physically interacting with the human/environment is considered. The simulation is implemented in MATLAB.

In the simulation, the sample interval is set to 1 ms. The adaptation procedure includes three steps. Firstly, the exploration noise is added to the initial control input to guarantee the PE conditions [Reference Vrabie and Lewis10], which is chosen as ${{e}_{k}}=\sum \nolimits _{\omega =1}^{10}{(0.06/\omega )\sin\!(k\omega )}$ . Then, optimal impedance learning is conducted until the criterion is satisfied, which is $|{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j+1}-{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j}|\lt 0.001$ . Finally, the optimized impedance model is acquired through iterations.

The parameters in (10) are defined as $Q_1=1$ , $Q_2=0.3$ , $R=0.1$ . In the simulation, the environment model is selected as $\ddot{x}+10\dot{x}+100(x-{{x}_{e}})=-F$ . The environment position and robot desired trajectory in the Cartesian space are given by (12), with $F_1=-1$ , $G_1=0.3$ , $F_2=-1$ , $G_2=1$ . All the elements of the initial matrix $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are selected as 0.01, and $\gamma$ is set as 0.3. Figures 6 and 7 show the convergence of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ . ${\boldsymbol{p}}_{\textbf{0}}$ , ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}}}$ , $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ reach convergence after 17 steps.

Figure 6. Convergence of the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{0}}}$ and $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ .

Figure 7. Convergence of the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{0}}}$ , ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{1}}}$ , ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{2}}}$ and ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{3}}}$ .

The optimal values of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are shown as follows:

\begin{align*} & p_{0}^{*}=15.9258, \text { }p_{u1}^{*}=-32.8697, \text { }p_{u2}^{*}=42.5815, \text { }p_{u3}^{*}=-29.1587 \\[3pt]& {{\boldsymbol{{p}}}_{\boldsymbol{{y}0}}}^{*}=[\!-4.2895,-2.2678,0], \text { }{{\boldsymbol{{p}}}_{\boldsymbol{{y}1}}}^{*}=[8.8085,-2.0308,0] \\[3pt] & {{\boldsymbol{{p}}}_{\boldsymbol{{y}2}}}^{*}=[1.0536,-1.5435,0], \text { }{{\boldsymbol{{p}}}_{\boldsymbol{{y}3}}}^{*}=[5.7937,-1.1972,0]. \end{align*}

Then, the optimal matrix $\boldsymbol{{P}}_{\boldsymbol{{d}}}$ is obtained from $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ according to (24) as follows:

(60) \begin{equation} {{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} -7.5863 & -18.5025 & 9.2325 & 1.9694 \\ \\[-8pt]-18.5025 & -4.2168 & -18.8880 & 11.1250 \\ \\[-8pt]9.2325 & -18.8880 & 19.6568 & -3.7325 \\ \\[-8pt]1.9694 & 11.1250 & -3.7325 & -1.7415 \end{array} \right ]. \end{equation}

For comparison, the ideal optimal matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}^{*}$ is calculated by using the model information as follows:

(61) \begin{equation} {{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{*}=\left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c} -7.26 & -18.3032 & 9.2331 & 1.8680 \\ \\[-8pt]-18.3032 & -3.9880 & -18.9033 & 11.0891 \\ \\[-8pt]9.2331 & -18.9033 & 19.64 & -3.875 \\ \\[-8pt]1.8680 & 11.0891 & -3.875 & -1.731 \end{array} \right ]. \end{equation}

As we know, the matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}^{*}$ is the optimal solution to minimize the cost function. From the two matrices, it is obvious that the optimal matrix from our method is close to the real solution, which validates the correctness of the proposed algorithm.

Remark 2: Since this method is model-free and data-based, the parameters have few influences on the results. There is almost no need to tune them. In order to further improve the system performance in practical applications, the control parameters can be fine-tuned by adjusting the parameters in the inner and outer loops separately.

5. Experiment

The experiments are carried out for the robot (Franka Emika Panda, Franka Emika GmbH, Germany) shown in Fig. 8. The robot has 7 degrees of freedom (DOFs) with seven flexible joints equipped with angle and torque sensors. The communication frequency is 1000 Hz. The robot has three control modes including velocity mode, position mode, and torque mode. The operation system is Ubuntu 16.04 with the platform ROS Kinetic. When human moves the robot, it will generate a desired trajectory. Then the virtual trajectory is calculated through the impedance model in real time. There are two independent control nodes in ROS which are employed for calculating and controlling the robot, and the communication frequency between the two nodes is 200 Hz. The first node is used to calculate the adaptive constraint controller and publish the control result to the second node. Then, the second node subscribes to the information from the robot and publishes the control signals to control the robot.

Figure 8. Human–robot collaboration task.

In order to verify the effectiveness of the proposed controller, a trajectory tracking task is conducted. For the convenience of the experiment, a desired trajectory is defined by human in the outer loop. As shown in the Fig. 1, the virtual desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ which calculated from impedance model is the input of the inner loop. It is adapted online on the basis of the human interaction with the robot. The parameters of the controller are chosen as followers: ${\boldsymbol{{S}}}({\cdot})$ is the Gaussian function, $l=2^8$ is the node numbers of NN, and the centers of each node are selected in the area of $[\!-\!1,1]$ , the center variance is set as 1.5. The initial weight $\hat{{\boldsymbol{{W}}}}$ of NN is 0, the controller parameters ${\boldsymbol{{K}}_{\textbf{2}}}=diag(10,10,10,10,10,10,10)$ and the constraint error of tracking are set as ${{k}_{b1}}=2$ , ${{k}_{b2}}=2$ , ${{k}_{b3}}=2$ , ${{k}_{b4}}=3$ , ${{k}_{b5}}=2$ , ${{k}_{b6}}=2.6$ , ${{k}_{b7}}=1.5$ , ${{k}_{a1}}=-1.2$ , ${{k}_{a2}}=-1.2$ , ${{k}_{a3}}=-1.5$ , ${{k}_{a4}}=-2$ , ${{k}_{a5}}=-2$ , ${{k}_{a6}}=-2$ , ${{k}_{a7}}=-1.8$ , ${{\sigma }_{i}}=0.02$ , ${{\boldsymbol{\Gamma }}_{i}}=200{{{{\boldsymbol{{I}}}}}_{16\times 16}}$ , $\left ( i=1,2,\ldots 7 \right )$ .

Figure 9. Tracking trajectory of the Franka robot.

Figure 9 shows the tracking trajectories of the three methods, it is obvious that the tracking performance of the proposed controller is better than others. Figure 10 shows the tracking errors in each direction, where green line is the zero baseline. Compared with the other two methods, the trajectory error of our proposed controller is smaller. The main verification metric for the accuracy of the controller is the tracking error. The average error of three controllers is provided in Fig. 11. Table II shows the average error and error variance of the three methods. It is obvious that the average error and error variance of the proposed controller are smaller than the other two methods. Moreover, we present the index of smoothness index (as described in ref. [Reference Roveda, Haghshenas, Caimmi, Pedrocchi and Tosatti17]) of three controllers in Fig. 12, which demonstrates that the proposed controller is smoother than the other two methods. The calculation input torque result of the adaptive constraint controller is shown in Fig. 13.

Table II. Error comparison of three controllers.

Figure 10. The tracking errors of different methods.

Figure 11. Average error of three controllers.

Figure 12. Smoothness of three controllers.

Figure 13. Input torques of Joints 1–7.

The calculated control torque is exerted in the robot when a HRI task is conducted on the Franka robot. The outer loop aims at optimizing the parameters of impedance model. For the cost function (10), the parameters $Q_1$ , $Q_2$ and $R$ are set to 1. The stopping criterion is $\left |{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j+1}-{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j} \right |\lt 0.01$ , and the discount factor $\gamma$ is selected as 0.3. The initial value of $p_0$ is set as 2, and the initial values of ${{\boldsymbol{{p}}}_{\boldsymbol{{y}0}}}$ is set as $\left [ \begin{matrix} \!-2 &\quad 10 &\quad 10 \\ \end{matrix} \right ]$ . The initial values of $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ are set as 1, and then all the other elements of the initial matrix $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are 0. The sampling interval is set as 10 ms.

Figures 14 and 15 show that the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}}}$ , ${{\boldsymbol{{p}}}_{\textbf{0}}}$ and $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ reach convergence after 1000 iteration. The optimal values of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are shown as follows:

Figure 14. Convergence of the the parameters $p_{y}$ in matrix $\overline{P}_d$ .

Figure 15. Convergence of the parameters $p_{0}$ , $p_{u}$ in matrix $\overline{P}_d$ .

\begin{align*} &p_0^*=4746.2, \text { }p_{u1}^*=360.8, \text { }p_{u2}^*=-7727.4, \text { }p_{u3}^*=4130.3\\[3pt] &{{\boldsymbol{{p}}}_{\boldsymbol{{y}0}}}^*=[5.54, -2.7, -3.63]\times 10^4, \text { }{{\boldsymbol{{p}}}_{\boldsymbol{{y}1}}}^*=[\!-\!3.84, 6.89, -1,99]\times 10^4\\[3pt] &{{\boldsymbol{{p}}}_{\boldsymbol{{y}2}}}^*=[\!-\!1.59, 3.89, 1.29]\times 10^4, \text { }{{\boldsymbol{{p}}}_{\boldsymbol{{y}3}}}^*=[\!-\!0.173, -8.19, 4.46]\times 10^4.\end{align*}

Figure 16 shows the interaction force, where the interval 0–10s is the data collection stage, and the interval 10–20s is the calculation stage. After 20s, the optimal control is exerted in the end-effector of the robot which is more stable to complete the given task.

Figure 16. Interaction force of three processes.

6. Conclusion

In this paper, to realize safe HRI, a novel impedance adaptation method is proposed in unknown environment. A double-loop system is constructed in which the inner loop improves the tracking accuracy with the safety of interaction and the outer loop fits the uncertainty of model. An adaptive dynamic programing with discrete output feedback is proposed in the outer loop to solve the problem of uncertain environment position. A discount factor is added for faster convergence speed. In the interaction process, the robot is limited in safe space through an ABLF-based inner loop controller. Meanwhile, the RBFNN compensates for the unknown dynamics in the design of the controller. Simulation and experimental results verify the effectiveness of the proposed controller in HRI system. In future works, a reduced-order observer will be added to reduce the dimensionality of states so as to increase the speed of calculation.

Author contributions statement

Xinyi Yu and Huizhen Luo conceived and designed the study. Huizhen Luo and Shuanwu Shi did the experiments. Wei Yan and Linlin Ou modify the article and verified correctness.

Financial support

This work was supported by National Key R ${\And}$ D Program of China (2018YFB1308400) and Natural Science Foundation of Zhejiang province (LY21F030018).

Conflicts of interest

The authors declare no conflicts of interest.

Ethical approval

This research did not require any ethical approval.

References

Rovedal, L., Pallucca, G., Pedrocchi, N., Brafhin, F. and Tosatti, L. M., “Iterative learning procedure with reinforcement for high-accuracy force tracking in robotized tasks,” IEEE Trans. Ind. Inform. 14(4), 17531763 (2018).CrossRefGoogle Scholar
Zhang, M., Mcdaid, A., Veale, J. A., Peng, Y. and Xie, S. Q., “Adaptive trajectory tracking control of a parallel ankle rehabilitation robot with joint-space force distribution,” IEEE Access 7, 8581285820 (2019).CrossRefGoogle Scholar
Murata, K., Yamazoe, H., Chung, M. G. and Lee, J. H., “Elderly Care Training Robot for Quantitative Evaluation of Care Operation Development of Robotic Elbow Joint to Imitate Elderly People’s Elbow,” In: IEEE/SICE International Symposium on System Integration (2017).CrossRefGoogle Scholar
Sharifi, M., Behzadipour, S. and Vossoighi, G., “Nonlinear model reference adaptive impedance control for human-robot interactions,” Control Eng. Pract. 32, 927 (2014).CrossRefGoogle Scholar
Dohring, M. and Newman, W., “The Passivity of Natural Admittance Control Implementations,” In: 20th IEEE International Conference on Robotics and Automation (2003) pp. 37103715.Google Scholar
Hogan, N., “Impedance control: An approach to manipulation: Part II-Implementation,” J. Dyn. Syst. Meas. Control 107(1), 816 (1985).CrossRefGoogle Scholar
Colbaugh and N. Hogan, J. E., “Robust control of dynamically interacting systems,” Int. J. Control 48(1), 6588 (1988).Google Scholar
Newman, W. S., “Stability and performance limits of interaction controllers,” J. Dyn. Syst. Meas. Control 114(4), 563570 (1992).CrossRefGoogle Scholar
Buerger, S. P. and Hogan, N., “Complementary stability and loop shaping for improved human-robot interaction,” IEEE Trans. Robot 23(2), 232244 (2007).CrossRefGoogle Scholar
Vrabie, D. and Lewis, F. L., “Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems,” Neural Netw. 22(3), 237246 (2009).CrossRefGoogle ScholarPubMed
Jiang, Y. and Jiang, Z. P., “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica 48(10), 26992704 (2012).CrossRefGoogle Scholar
Roveda, L., Maskani, J., Franceschi, P., Arash, A. and Pedrocchi, N., “Model-based reinforcement learning variable impedance control for human-robot collaboration,” J. Intell. Robot. Syst. 100, 417433 (2020).CrossRefGoogle Scholar
Rizvi, S. A. A. and Lin, Z. L., “Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback,” IEEE Trans. Cybern. 50(11), 46704679 (2020).Google Scholar
Gao, W., Jiang, Y., Jiang, Z. P. and Chai, T., “Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming,” Automatica 72(3), 3745 (2016).CrossRefGoogle Scholar
Liu, X., Ge, S. S., Zhao, F. and Mei, X. S., “Optimized impedance adaptation of robot manipulator interacting with unknown environment,” IEEE Trans. Control Syst. Technol. 29(1), 411419 (2021).CrossRefGoogle Scholar
Al-Tamimi, A., Lewis, F. L. and Abu-Khalaf, M., “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica 43(3), 473481 (2007).CrossRefGoogle Scholar
Roveda, L., Haghshenas, S., Caimmi, M., Pedrocchi, N. and Tosatti, L. M., “Assisting operators in heavy industrial tasks: on the design of an optimized cooperative impedance fuzzy-controller with embedded safety rules,” Front. Robotics AI 6(75), 131 (2019).Google ScholarPubMed
Ahmed, M. R. and Kalaykov, I., “Static and Dynamic Collision Safety for Human Robot Interaction Using Magnetorheological Fluid Based Compliant Robot Manipulator,” In: IEEE International Conference on Robotics and Biomimetics (2010) pp. 370375.Google Scholar
Fonseca, M. D. A., Adorno, B. V. and Fraisse, P., “Task-space admittance controller with adaptive inertia matrix conditioning,” J. Intell. Robot. Syst. 101(2), 1 (2021).CrossRefGoogle Scholar
Vicentini, F., Pedrocchi, N., Beschi, M., Giussani, M. and Fogliazza, G., “PIROS: Cooperative, Safe and Reconfigurable Robotic Companion for CNC Pallets load/unload Stations,” In: Bringing Innovative Robotic Technologies From Research Labs to Industrial End-users (Springer, Cham, 2020) pp. 5796.CrossRefGoogle Scholar
Ngo, K. B., Mahony, R. and Jiang, Z. P., “Integrator Backstepping Using Barrier Functions for Systems with Multiple State Constraints,” In: IEEE Conference on Decision Control/European Control Conference (CCD-ECC) (2005) pp. 83068312.Google Scholar
Ames, A. D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K. and Tabuada, P., “Control Barrier Functions: Theory and Applications,” In: European Control Conference (2019) pp. 34203431.Google Scholar
Zhang, S., Xiao, S. T. and Ge, W. L., “Approximation-based Control of an Uncertain Robot with Output Constraints,” In: International Conference on Intelligent Control and Automation Science , vol. 46 (2013) pp. 5156.Google Scholar
He, W., Chen, Y. H. and Yin, Z., “Adaptive neural network control of an uncertain robot with full-state constraints,” IEEE Trans. Cybern. 46(3), 620629 (2016).Google ScholarPubMed
Yu, H., Xie, T. T., Paszczynski, S. and Wilamowski, B. M., “Advantages of radial basis function networks for dynamic system design,” IEEE Trans. Ind. Electron. 58(12), 54385450 (2011).CrossRefGoogle Scholar
He, W. and Dong, Y. T., “Adaptive fuzzy neural network control for a constrained robot using impedance learning,” IEEE Trans. Neur. Netw. Learn. 29(4), 11741186 (2018).CrossRefGoogle ScholarPubMed
Yao, Q. J., “Neural adaptive learning synchronization of second-order uncertain chaotic systems with prescribed performance guarantees,” Chaos Solitons & Fractals. 152 (2021).CrossRefGoogle Scholar
Liu, Y. J., Lu, S. M., C. Tong, S., Chen, X. K., Chen, C. L. P. and Li, D. J., “Adaptive control-based barrier Lyapunov functions for a class of stochastic nonlinear systems with full state constraints,” Automatica 83, 8393 (2018).CrossRefGoogle Scholar
Li, Z. J., Liu, J. Q., Huang, Z. C., Peng, Y., Pu, H. Y. and Ding, L., “Adaptive impedance control of human-robot cooperation using reinforcement learning,” IEEE Trans. Ind. Electron. 64(10), 80138022 (2017).CrossRefGoogle Scholar
Modares, H., Ranatunga, I., Lewis, F. L. and Popa, D. O., “Optimized assistive human-robot interaction using reinforcement learning,” IEEE Trans. Cybern. 46(3), 655667 (2016).CrossRefGoogle ScholarPubMed
Lewis, F. L. and Vamvoudakis, K. G., “Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data,” IEEE Trans. Syst. Man Cybern. B Cybern. 41(1), 1425 (2011).CrossRefGoogle ScholarPubMed
Zhang, S., Dong, Y. T., Ouyang, Y. C., Yin, Z. and Peng, K. X., “Adaptive neural control for robotic manipulator with output constraints and uncertainties,” IEEE Trans. Netw. Learn. Syst. 29(11), 55545564 (2018).CrossRefGoogle ScholarPubMed
Zheng, H. H. and Fang, Z. D., “Research on tracking error of robot arm based on adaptive neural network control,” Mach. Des. Manufact. 6, 139141 (2019).Google Scholar
Wu, Y. X., Huang, R., Wang, Y. and Wang, J. Q., “Adaptive tracking control of robot manipulators with input saturation and time-varying output constraints,” Asian J. Control 23(3), 14761489 (2021).CrossRefGoogle Scholar
Figure 0

Figure 1. System structure of double-loop control.

Figure 1

Table I. Parameters of the 2-DOF robot manipulator.

Figure 2

Figure 2. Diagram of the 2-DOF robot system.

Figure 3

Figure 3. Tracking performance of different controllers of Joints 1 and 2.

Figure 4

Figure 4. Tracking error of different controllers of Joints 1 and 2.

Figure 5

Figure 5. Control input of three control methods.

Figure 6

Figure 6. Convergence of the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{0}}}$ and $\boldsymbol{{p}}_{\boldsymbol{{u}}}$.

Figure 7

Figure 7. Convergence of the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{0}}}$, ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{1}}}$, ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{2}}}$ and ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}\boldsymbol{3}}}$.

Figure 8

Figure 8. Human–robot collaboration task.

Figure 9

Figure 9. Tracking trajectory of the Franka robot.

Figure 10

Table II. Error comparison of three controllers.

Figure 11

Figure 10. The tracking errors of different methods.

Figure 12

Figure 11. Average error of three controllers.

Figure 13

Figure 12. Smoothness of three controllers.

Figure 14

Figure 13. Input torques of Joints 1–7.

Figure 15

Figure 14. Convergence of the the parameters $p_{y}$ in matrix $\overline{P}_d$.

Figure 16

Figure 15. Convergence of the parameters $p_{0}$, $p_{u}$ in matrix $\overline{P}_d$.

Figure 17

Figure 16. Interaction force of three processes.