Hostname: page-component-cd9895bd7-mkpzs Total loading time: 0 Render date: 2024-12-27T07:34:45.118Z Has data issue: false hasContentIssue false

A constrained framework based on IBLF for robot learning with human supervision

Published online by Cambridge University Press:  24 April 2023

Donghao Shi
Affiliation:
School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou, China
Qinchuan Li*
Affiliation:
School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou, China
Chenguang Yang
Affiliation:
Bristol Robotics Laboratory, University of the West of England, Bristol, UK
Zhenyu Lu
Affiliation:
Bristol Robotics Laboratory, University of the West of England, Bristol, UK
*
Corresponding author: Qinchuan Li; Email: lqchuan@zstu.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Dynamical movement primitives (DMPs) method is a useful tool for efficient robotic skills learning from human demonstrations. However, the DMPs method should know the specified constraints of tasks in advance. One flexible solution is to introduce the human superior experience as part of input. In this paper, we propose a framework for robot learning based on demonstration and supervision. Superior experience supplied by teleoperation is introduced to deal with unknown environment constrains and correct the demonstration for next execution. DMPs model with integral barrier Lyapunov function is used to deal with the constrains in robot learning. Additionally, a radial basis function neural network based controller is developed for teleoperation and the robot to track the generated motions. Then, we prove convergence of the generated path and controller. Finally, we deploy the novel framework with two touch robots to certify its effectiveness.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Robots are now widely used in many fields, from industrial manufacturing to daily life. They are not limited to structured environments and single, repetitive tasks. To reduce the programming effort for different tasks, learning from demonstrations [Reference Atkeson, Schaal and Systems1] is proposed. The dynamic movement primitives (DMPs) [Reference Schaal, Mohajerian and Ijspeert2] method is a flexible and effective way to transfer manipulation skills from humans to robots easily. DMPs can generalize the actions learning from demonstration and guarantee convergence to a goal position. The method has been successfully applied in many robotic scenarios, such as assembly operations [Reference Tang, Lin, Zhao, Fan and Tomizuka3Reference Lioutikov, Neumann, Maeda and Peters5], robotic surgery [Reference Moro, Nejat and Mihailidis6Reference Osa, Harada, Sugita and Mitsuishi8], and collaborative bimanual tasks [Reference Gams, Nemec, Ijspeert and Ude9].

However, there are many factors cause task failures using DMPs [Reference Argall, Chernova, Veloso and Browning10] including: (1) the export cannot demonstrate every correct action for all the possible states. (2) The environment of the tasks may be variable in actual (e.g., new obstacles and constrains of manipulator). To solve these questions, some scholars introduced human supervision as a part of the system. In ref. [Reference Losey and O’Malley11], Losey and O’Malley introduced the kinematic adjustments to successfully deduce parameters of an optimal policy. In ref. [Reference Nemec, Zlajpah, Slajpa, Piskur and Ude12], Nemec et al. proposed a learning from demonstration framework where DMPs based on kinematic corrections to the behavior of an impedance-controlled robot. In ref. [Reference Hagenow, Senft, Radwin, Gleicher and Zinn13], Hagenow et al. proposed corrective shared autonomy system, corrective shared autonomy is introduced to leverage user input to address uncertainty in robot tasks by targeting corrections to task-specific variables. Hagenow used the method of supervisory control [Reference Sheridan14], which is a kind of remote control methods to remain the responsibility of the human operator. Rather than correcting the robot directly, the remote-based approach is safer and space-free [Reference Si, Guan and Wang15]. In this article, corrections of operator are also introduced by a remote control system.

The correction of expert improves the quality of task completion, but it also raises new questions: (1) correction imposed by supervisor may cause manipulator over its constraint space and (2) and the correction of trajectory cannot be unable to discern the intent. One effective way to deal with constraint problems is barrier Lyapunov function (BLF). In ref. [Reference Yang, Huang, He, Cheng and Systems16], an asymmetric time-varying BLF is used to guarantee the time-varying output constraints. In contrast to the log [Reference Tee, Ge and Tay17] and tan [Reference Jin18] type BLF, integral BLF can limit state signals directly, rather than error signals [Reference Wei, Shuang and Ge19]. Integral barrier Lyapunov function (IBLF) [Reference He, Xue, Yu, Li and Yang20] is proposed to guarantee the end-effector of the robot in the constrained task space. As a result, IBLF is utilized in this article to ensure the manipulator’s end-effector in the restricted task space following the repair of export.

Then correction is divided into two kinds by supervisor: improve the quality of demonstrations and avoid the collision in complex environment. The corrected motion can be used as a new quality demonstration. Collision information can be got from the correction. In ref. [Reference Calinon, D’Halluin, Sauser, Caldwell and Billard21], constraints for DMPs have been successfully treated as point-like obstacles and volumetric obstacles. Based on DMPs model and BLFs, Lu et al. [Reference Lu, Wang and Yang22] propose a BLF-based DMPs framework with the classified constraints.

In this article, the proposed framework is illustrated in Fig. 1, which includes motion generation and single execution. Motion generation uses classic DMPs method with velocity limits inspired by ref. [Reference Si, Wang and Yang23] to learn the corrected path. Single execution part designs a remote control system to ensure the end-effector of manipulator following the learned trajectory, where radial basis function neural network (RBFNN) is employed to approximate the unknown robot dynamics. The trajectory is modified by the correction information of export and the confined space.

Figure 1. Block diagram illustrating the constrained system.

The following is a list of the major contributions:

  • A novel framework based on supervision is proposed that considers both the motion learning and task executing, including a modified DMPs method and a remote-control system for supervision.

  • A new framework combining DMPs and IBLF is proposed to solve constrained trajectory planning problem based on the correction. Velocity limits are also met.

  • A modified remote-control system is introduced, and constrains of system are limited via the IBLF. Furthermore, the stability of the system can be ensured using Lyapunov stability theorem.

The following is the list of paper that is organized as Section 2 introduces basic information of the DMPs and IBLF. The learning process of the corrected motion and remote corrective control system are introduced in Section 3. The experiments are presented in Section 4. Section 5 concludes this paper.

2. Preliminary work

2.1. Dynamics modeling of remote system

The dynamics of the teleoperation system for master and slave in the task space can be modeled as follows:

(1) \begin{equation} M_{m}\!\left(x_{m}\right)\ddot{x}_{m} + C_{m}\!\left(x_{m},\dot{x}_{m}\right)\dot{x}_{m} + G_{m}\!\left(x_{m}\right) + D_{m} =\, f_{m} \end{equation}
(2) \begin{equation} M_{s}\!\left(x_{s}\right)\ddot{x}_{s} + C_{s}\!\left(x_{s},\dot{x}_{s}\right)\dot{x}_{s} + G_{s}\!\left(x_{s}\right) + D_{s} =\, f_{s} \end{equation}

where $x_{m}, \dot{x}_{m}, \ddot{x}_{m}$ and $x_{s}, \dot{x}_{s}, \ddot{x}_{s}$ are the position, velocity, and acceleration signals of end-effort for master and slave manipulators, $M_{m}(x_{m})$ and $M_{s}(x_{s})$ are the inertia matrices, $C_{m}(x_{m},\dot{x}_{m})$ and $C_{s}(x_{s},\dot{x}_{s})$ account centrifugal/Coriolis terms, $G_{m}(x_{m})$ and $G_{s}(x_{s})$ are the gravitational matrices, $D_{m}$ and $D_{s}$ are the modeling errors and external disturbance, and $f_{m}$ and $f_{s}$ are the control input of master and slave devices.

Property 1: The matrices $\dot{M}_{m} - 2C_{m}$ and $\dot{M}_{s} - 2C_{s}$ are skew-symmetric.

2.2. RBFNN

RBFNN is used to approximation manipulator uncertainties to handle the uncertainty issue in the dynamic model. The following introduces the definition of the RBF neural network utilized in this article:

(3) \begin{equation} F\mathit{(}\vartheta \mathit{)} = W^{T}S\!\left(\vartheta \right) + \varepsilon \end{equation}

where $\vartheta$ denotes the input of the neural network, $W=[\omega _{1},\omega _{2},\ldots \omega _{n}]^{T}$ is the ideal weight parameter, $n$ is the number of RBFNN nodes, and $\varepsilon (\vartheta )$ is the approximation errors. $S(\vartheta )=[s_{1}(\vartheta ),s_{2}(\vartheta ),\ldots s_{n}(\vartheta )]^{T}$ is the Gaussian basis function in the form as

(4) \begin{equation} s_{i}\!\left(\vartheta \right)=\exp \!\left({-}\frac{\left(\vartheta -\mathrm{c}_{i}\right)^{2}}{b_{i}^{2}}\right) \end{equation}

where $\mathrm{c}_{i}, \mathrm{b}_{i}$ are the center and width of the neuron. The ideal weight vector $W^{*}$ is an artificial quantity for the purposed method, which aims to minimize the value of $W$ .

(5) \begin{equation} W^*=\text{argmin}\!\left\{|\varepsilon |\right\} \end{equation}

2.3. General DMPs model

DMPs is used to trajectory learning. DMPs consists of two main components: (1) spring-damper-type equation which draws our system to the target and (2) a forcing term that gets the desire behaviors.

The DMPs model is first introduced as

(6) \begin{equation} \left\{\begin{array}{l} \tau \dot{v}=\alpha _{z}\!\left(\beta _{z}\!\left(g-x\right)-v\right)+f(s)\\ \tau \dot{x}=v\\ \tau \dot{s}={-}\alpha _{s}s,s_{0}=1 \end{array}\right. \end{equation}

where $\alpha _{z},$ $\beta _{z}$ are the positive parameters, $x$ is the position variable, $g$ is the goal point, $v$ is the velocity, $\dot{v}$ is the acceleration, $\tau \gt 0$ is the time constant, $s$ is a phase variable that avoid explicit time dependence, the initial value of $s$ is set as 1, and $\alpha _{s}$ is the factor to modify the converging time

The forcing term $f(s)$ is defined as

(7) \begin{equation} f(s)=\frac{\sum _{i=1}^{N} \psi _{i} w_{i}}{\sum _{i=1}^{N} \psi _{i}} s \!\left(g-y_{0}\right) \end{equation}

The forcing term has the components that consist of N Gaussian basis functions, enable the encoding of demonstrated trajectory, where $y_{0}$ is the starting position state, $w_{i}$ is the column of the weight vector, $\psi _{i}$ is Gaussian radial basis function, where $\psi _{i}(s)=\exp\! ({-}h_{i}(\mathrm{s}-\mathrm{c}_{i})^{2}), \mathrm{c}_{i}$ is the center of Gaussian kernels and $h_{i}$ is the variance. The vector $w_{i}$ can be trained with supervised learning algorithms such as locally weighted regression.

The calculating process is proposed to minimize the error function as following:

(8) \begin{equation} \min \!\left\{\,f^{t}(s)-\,f(s)\right\} \end{equation}

where $f(s)$ is an item calculated by the trajectory in demonstration, and $f^{t}(s)$ represents the target value as following:

(9) \begin{equation} f^{t}(s)=\tau \dot{v}-\alpha _{z}\!\left(\beta _{z}(g-x)-v\right) \end{equation}

2.4. General IBLF

Consider the strict feedback nonlinear system described as

(10) \begin{equation} \begin{cases} \dot{x}_{1} =\,f_{1}+g_{1}x_{2}\\ \dot{x}_{2} =\,f_{2}+g_{2}u\\ y =x_{1} \end{cases} \end{equation}

where $f_{1},f_{2},g_{1},g_{2}$ are smooth functions, $x_{1},x_{2}$ are the states, and $u$ and $y$ are the input and output.

Introduce the IBLF candidate in the following:

(11) \begin{equation} V_{1}=\int _{0}^{z_{1}}\frac{\rho k_{c}^{2}}{k_{c}^{2}-\left(\rho +x_{r}\right)^{2}}d\rho \end{equation}

where $k_{c}$ is the constant, $x_{r}$ is the variable, $\rho$ is a member of integrating, and the error variable $z_{1}= x_{1}-x_{r}, z_{2}= x_{2}-\alpha, \alpha$ is a continuously differentiable function.

The time derivative of $V_{1}$ is given by

(12) \begin{equation} \dot{V}_{1}=\frac{z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}\dot{z}_{1}+\frac{\partial V_{1}}{\partial x_{r}}\dot{x}_{r} \end{equation}

where

(13) \begin{equation} \frac{\partial V_{1}}{\partial x_{r}}=z_{1}\!\left(\frac{k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}-\lambda \right) \end{equation}
(14) \begin{equation} \lambda =\frac{k_{c}}{2z_{1}}\ln \frac{\left(k_{c}+z_{1}+x_{r}\right)\left(k_{c}-x_{r}\right)}{\left(k_{c}-z_{1}-x_{r}\right)\!\left(k_{c}+x_{r}\right)} \end{equation}

The virtual control variable $\alpha$ can be designed as follows:

(15) \begin{equation} \alpha =\frac{1}{g_{1}}\!\left(-k_{1}z_{1}-f_{1}+\frac{g_{1}\!\left(k_{c}^{2}-x_{1}^{2}\right)\dot{x}_{r}\lambda }{k_{c}^{2}}\right) \end{equation}

where $k_{1}$ is positive constant. Substituting (15) into (12), we can get

(16) \begin{equation} \dot{V}_{1}=-\frac{k_{1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\frac{g_{1}z_{1}z_{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}} \end{equation}

Then, a Lyapunov function candidate is chosen as follows:

(17) \begin{equation} V_{2}=V_{1}+\frac{1}{2}z_{2}^{2} \end{equation}

The time derivative of $V_{2}$ is given by

(18) \begin{equation} \dot{V}_{2}=-\frac{k_{1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\frac{g_{1}z_{1}z_{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+z_{2}\!\left(f_{2}+g_{2}u-\dot{\alpha }\right) \end{equation}

The control law is designed as

(19) \begin{equation} u=\frac{1}{g_{2}}\!\left(-f_{2}+\dot{\alpha }-k_{2}z_{2}-\frac{g_{1}z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}\right) \end{equation}

3. Proposed framework

A novel approach that combines DMPs and IBLF is introduced to address limited trajectory planning based on the correction. The remote-control system is made to guarantee that the manipulator end-effector is within the restricted space and to offer export corrective information. The controller of master and salve is designed and analyzed.

3.1. DMPs based on IBLF

The expression of DMPs as (6) can be revised as nonlinear system as follows:

(20) \begin{equation} \left\{\begin{array}{l} \dot{x}_{1}=g_{1}x_{2}\\[2pt] \dot{x}_{2}=\,f_{2}+u+\Delta u\\[2pt] \tau \dot{s}=-\alpha _{s}s,s(0)=1\\[2pt] \dot{\tau }=-\gamma _{0}\!\left(\tau -\tau _{0}\right)+\gamma _{1}\tau v^{T}\sigma (v),\tau _{0}=1\\[2pt] y=x_{1} \end{array}\right. \end{equation}

where $x_{1},x_{2},g_{1}$ represent the $x,v,1/\tau$ in Eq. (5), $f_{2}=(\alpha _{z}(\beta _{z}(g-x)-v)+\gamma \sigma (v))/\tau, u$ is the forcing function $f(s), \Delta u$ is a term added by IBLF, $\gamma, \gamma _{0}, \gamma _{1}$ are the positive constants, to allow the velocity being close to the limit while still not exceeding it, the $\sigma _{i}$ is designed as

(21) \begin{equation} \sigma _{i}\!\left(v_{i}\right)=\frac{v_{i}}{\left(A_{i}-v_{i}\right)\!\left(B_{i}+v_{i}\right)} \end{equation}

where $A_{i}$ , $B_{i}$ are the positive constants.

Similar to the general form of IBLF, we define $z_{1}=x_{1}-x_{r}, z_{2}=x_{2}-\alpha, x_{r}$ is the desired state in control system, while there is no such state in DMPs, so here we introduce the motion generated by DMPs method without any limits as the desired state:

(22) \begin{equation} \left\{\begin{array}{c} \dot{x}_{r}=g_{1}v_{c}\\[3pt] \dot{v}_{c}=\dfrac{\alpha _{z}\!\left(\beta _{z}\!\left(g-x_{r}\right)-v_{c}\right)+f(s)}{\tau } \end{array}\right. \end{equation}

Theorem 1: The output constraint is never violated, and all closed loop signals are confined if the following conditions are met for the DMPs function represented by (20).

(23) \begin{equation} \left\{\begin{array}{l} \alpha =\dfrac{1}{g_{1}}\!\left(-k_{1}z_{1}+\dfrac{g_{1}\!\left(k_{c}^{2}-x^{2}\right)\dot{x}_{r}\lambda }{k_{c}^{2}}\right)\\[14pt] u+\Delta u=-f_{2}+\dot{\alpha }-k_{2}z_{2}-\dfrac{g_{1}z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}} \end{array}\right. \end{equation}

where $\lambda$ is introduced in (14)

To get the form of $\Delta u$ , we calculate $u$ without added term is

(24) \begin{equation} u=\dot{v}-f_{2} \end{equation}

So

(25) \begin{equation} \Delta u=-k_{2}z_{2}-\frac{g_{1}z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\dot{\alpha }-\dot{v} \end{equation}

Proof:

We synthesize a Lyapunov function as

(26) \begin{equation} V^{c}=V_{1}+\frac{1}{2}z_{2}^{2} \end{equation}

where $V_{1}$ is the IBLF candidate in (11)

Then

(27) \begin{align} \dot{V}^{c}&=\dfrac{z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}\dot{z}_{1}+\dfrac{\partial V_{1}}{\partial x_{r}}\dot{x}_{r}+\mathrm{z}_{2}^{\mathrm{T}}\dot{\mathrm{z}}_{2}\nonumber\\ &=\dfrac{z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}\!\left(\dot{x}-\dot{x}_{r}\right)+\dfrac{\partial V_{1}}{\partial x_{r}}\dot{x}_{r}+\mathrm{z}_{2}^{\mathrm{T}}\!\left(\dot{v}-\dot{\alpha }\right)\nonumber\\ &=\dfrac{z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}\!\left(g_{1}\!\left(z_{2}+\alpha \right)-\dot{x}_{r}\right)+\dfrac{\partial V_{1}}{\partial x_{r}}\dot{x}_{r}\nonumber\\&\quad + \mathrm{z}_{2}^{\mathrm{T}}\!\left(f_{2}+u+\Delta u-\dot{\alpha }\right) \end{align}

Taking the expressions of (16) into (20), we have

(28) \begin{align} \dot{V}^{c} & =\dfrac{g_{1}z_{2}z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}-\dfrac{k_{1}z_{_{1}}^{2}k_{c}^{2}}{k_{c}^{2}-x^{2}}+z_{1}\lambda \dot{x}_{r}\nonumber\\&\quad -\dfrac{z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}\dot{x}_{r}+z_{1}\!\left(\dfrac{k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}-\lambda \right)\dot{x}_{r}\nonumber\\ &\quad +\mathrm{z}_{2}^{\mathrm{T}}\!\left(f_{2}-f_{2}+\dot{\alpha }-k_{2}z_{2}-\dfrac{g_{1}z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}-\dot{\alpha }\right)\nonumber\\ &=\dfrac{g_{1}z_{2}z_{1}k_{c}^{2}}{k_{c}^{2}-x^{2}}-\dfrac{k_{1}z_{_{1}}^{2}k_{c}^{2}}{k_{c}^{2}-x^{2}}+\mathrm{z}_{2}^{\mathrm{T}}\!\left(-k_{2}z_{2}-\dfrac{g_{1}z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}\right)\nonumber\\ &=-\dfrac{k_{1}z_{_{1}}^{2}k_{c}^{2}}{k_{c}^{2}-x^{2}}-k_{2}z_{_{2}}^{2}\leq 0 \end{align}

where $k_{1},k_{2}$ are positive numbers. according to Lyapunov stability theorem, we know that above expression guarantees global stability and the global tracking convergence in the system.

3.2. Master controller design

The control command on the master robot is designed as an impedance controller, such that the position of the robot end effector can be moved by export, let us consider the modeling of the operating torque. In this paper, a damping-stiffness model is considered.

(29) \begin{equation} F_{e}=k_{m}\mathrm{e}_{m}+d_{m}\dot{\mathrm{e}}_{m} \end{equation}

where $k_{m}$ and $d_{m}$ denote designed damping and stiffness matrices, where $e_{m}=x_{md}-x_{m}, x_{md}$ is the desired state, in this remote corrective system, it is fixed value. For avoiding the slave robot moving off the edge of the surface during control, a variable stiffness is proposed

(30) \begin{equation} k_{m}=k_{\textit{min}}+\mathrm{e}^{-d\left(x_{m}\right)}\!\left(k_{\textit{max} }-k_{\textit{min} }\right) \end{equation}

where $\alpha$ is the proximity of distance to boundary, closer get to the boundary, the closer get to 1, where $d(x_{m})$ is the distance to the nearest edge, $k_{\textit{max} }, k_{\textit{min} }$ are positive numbers.

Then, the control torque can be designed as

(31) \begin{equation} f_{m}=\hat{\mathrm{G}}_{m}+\mathrm{u}_{m}+F_{e} \end{equation}

where $u_{m}$ is robust term, and ${\mathop{G}\limits^\frown}_{m}$ is the estimation of $G$ , which satisfies: $G-\hat{\mathrm{G}}_{m}=\varepsilon _{m}$ .

3.3. Slave controller design

Inspired by ref. [Reference Chen, Huang, Sun, Gu and Yao24], the desired position signal $x_{sd}$ can be derived via the slave trajectory creator. For simplicity, a filter as

(32) \begin{equation} V_{f}(s)=1/\!\left(1+\tau _{f}s\right)^{2} \end{equation}

It is used with the input of $x_{m}(t - T(t))$ to create the correction state $x_{sc}$ , then $x_{sd}=\Delta x_{sd}+x_{sdmp}$ , where $\Delta \mathrm{x}_{\mathrm{sd}}=x_{sc}-x_{sd}$ is the forward kinematic function of the Touch $x_{sdmp}$ is the path generate by DMPs.

Due to the disturb and errors in tracking, the trajectory generated by creator may not guarantee the $x$ in the constraint space all the time, we can obtain $x$ by a soft saturation function, then if the x cross the bound it will map the inside state. We derive the following via a saturation function to guarantee that the reference trajectory stays inside the limited region:

(33) \begin{equation} x_{\mathrm{sd}}=\left\{\begin{array}{l} x_{\mathrm{sd}},\left| x_{\mathrm{sd}}\right| \leq \eta k_{{c_{i}}}\\ \eta k_{{c_{i}}},x_{\mathrm{sd}}\lt -\eta k_{{c_{i}}}\\ \eta k_{{c_{i}}},x_{\mathrm{sd}}\gt \eta k_{{c_{i}}} \end{array}\right. \end{equation}

where $\eta$ is a constant very close to 1.

Because of the uncertainity of robot dynamic and model, reference trajectory cannot ensure the end-effector stay in the constrained space. IBLF method is introduced to ensure the constrains of the predefined task space met.

We can obtain the dynamics system of slave

(34) \begin{equation} \left\{\begin{array}{l} \dot{x}_{1}=x_{2}\\ \dot{x}_{2}=M_{s}^{-1}\!\left(f_{s}-D_{s}-G_{s}-C_{s}\dot{x}_{s}\right)\\ y=x_{1} \end{array}\right. \end{equation}

Then, we define $z_{1}=x_{1}-x_{sd}, z_{2}=x_{2}-\alpha$ , where $z_{1}, z_{2}$ are error variables, and $\alpha$ denotes the virtual control variable, we design it as

(35) \begin{equation} a = - k_{s1}z_{1} + \frac{\left(k_{c}^{2} - x_{1}^{2}\right)\dot{x}_{r}\lambda }{k_{c}^{2}} \end{equation}

where $k_{s1}$ is the positive number, $k_{c}$ is the limit in Cartesian space, and $\lambda$ is given in (14)

The control law is designed as

(36) \begin{equation} f_{s} = - \frac{z_{1}k_{c}^{2}}{k_{c}^{2} - x_{1}^{2}} - k_{s2}z_{2} + \hat{W}^{T}S(Z) + u_{s} \end{equation}

In the actual system, the dynamics parameters and are typically unknown. RBFNN is used to approximation manipulator uncertainties to handle the uncertainty issue in the dynamic model. The neural network’s input on the slave side can be chosen as $\mathrm{Z}=[\mathrm{x}_{\mathrm{s}},\dot{\mathrm{x}}_{\mathrm{s}},{\alpha},\dot{{\alpha} }]$ . RBF neural network is defined as

(37) \begin{equation} W^{T}S(Z) = M_{s}\dot{a} + C_{s}a + G_{s} + e_{s} \end{equation}

3.4. Stability analysis

Define the Lyapunov function of master and salve as

(38) \begin{equation} V_{m} = \frac{1}{2}\!\left(\dot{x}_{m}^{T}M_{m}\dot{x}_{m} + e_{m}^{T}k_{m}e_{m}\right) \end{equation}
(39) \begin{equation} V_{s} = V_{1} + \frac{1}{2}z_{2}^{T}M_{s}z_{2} + \frac{1}{2}tr\!\left(\tilde{W}_{s}^{T}G_{s}^{ - 1}\tilde{W}_{s}\right) \end{equation}

where $\tilde{W}=\hat{W}-W$

The Lyapunov candidate function of whole system as

(40) \begin{align} V&=V_{m}+V_{s}\nonumber\\ &=\dfrac{1}{2}\!\left(\dot{\mathrm{x}}_{m}^{T}\mathrm{M}_{m}\dot{\mathrm{x}}_{m}+e_{m}^{T}k_{m}\mathrm{e}_{m}\right)+\mathrm{V}_{1}+\dfrac{1}{2}\mathrm{z}_{2}^{\mathrm{T}}\mathrm{M}_{s}\mathrm{z}_{2}+\dfrac{1}{2}\mathrm{tr}\!\left(\tilde{\mathrm{W}}_{\mathrm{s}}^{\mathrm{T}}{\Gamma} _{\mathrm{s}}^{-1}\tilde{\mathrm{W}}_{\mathrm{s}}\right) \end{align}

Then, the derivative of $\mathrm{V}_{m}$ can be calculated as

(41) \begin{equation} \dot{\mathrm{V}}=\ddot{x}_{m}^{T}\mathrm{M}_{m}\dot{\mathrm{x}}_{m}+\dfrac{1}{2}\dot{x}_{m}^{T}\dot{M}_{m}\dot{\mathrm{x}}_{m}+\dot{e}_{m}^{T}k_{m}\mathrm{e}_{m}+\dot{\mathrm{V}}_{1} +\mathrm{z}_{2}^{\mathrm{T}}\mathrm{M}_{s}\dot{\mathrm{z}}_{2}+\dfrac{1}{2}\mathrm{z}_{2}^{\mathrm{T}}\dot{\mathrm{M}}_{s}\mathrm{z}_{2}+\mathrm{tr}\!\left(\tilde{\mathrm{W}}_{\mathrm{s}}^{\mathrm{T}}{\Gamma} _{\mathrm{s}}^{-1}\dot{\tilde{W}}_{\mathrm{s}}\right) \end{equation}

Combine (1), (2), (16), (41)

(42) \begin{align} \dot{\mathrm{V}} & =\dot{x}_{m}^{T}\!\left(f_{m}-\mathrm{C}_{m}\dot{\mathrm{x}}_{m}-\mathrm{G}_{m}-\mathrm{D}_{m}+\dfrac{1}{2}\dot{M}_{m}\dot{\mathrm{x}}_{m}\right)-\dot{e}_{m}^{T}k_{m}e_{m}\nonumber\\ &\quad -\dfrac{k_{1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\dfrac{z_{1}z_{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\mathrm{z}_{2}^{\mathrm{T}}\!\left(f_{s}-G_{s}-\mathrm{C}_{s}\dot{\mathrm{x}}_{\mathrm{s}}-D_{s}-\mathrm{M}_{s}\dot{\alpha }\right)\nonumber\\ &\quad +\dfrac{1}{2}\mathrm{z}_{2}^{\mathrm{T}}\dot{\mathrm{M}}_{s}\mathrm{z}_{2}+\mathrm{tr}\tilde{\mathrm{W}}_{\mathrm{s}}^{\mathrm{T}}{\Gamma} _{\mathrm{s}}^{-1}\dot{\tilde{W}} \end{align}

Substituting (31) (36) into (42), we have

(43) \begin{align} \dot{V} & =\dot{x}_{m}^{T}\!\left(\hat{\mathrm{G}}_{m}+\mathrm{u}_{m}+F_{e}-\mathrm{C}_{m}\dot{\mathrm{x}}_{m}-\mathrm{G}_{m}-\mathrm{D}_{m}+\dfrac{1}{2}\dot{M}_{m}\dot{\mathrm{x}}_{m}\right)-\dot{e}_{m}^{T}k_{m}e_{m}\nonumber\\ &\quad -\dfrac{k_{s1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+\dfrac{1}{2}\mathrm{z}_{2}^{\mathrm{T}}\!\left(\dot{\mathrm{M}}_{\mathrm{s}}-2\mathrm{C}_{\mathrm{s}}\right)\mathrm{z}_{2}\nonumber\\ &\quad +\mathrm{z}_{2}^{\mathrm{T}}\!\left(\dfrac{z_{1}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}+f_{s}-D_{s}-G_{s}-\mathrm{C}_{\mathrm{s}}{\alpha} -\mathrm{M}_{\mathrm{s}}\!\left(\mathrm{q}\right)\dot{\alpha }\right)\nonumber\\ &\quad +\mathrm{tr}\tilde{\mathrm{W}}_{\mathrm{s}}^{\mathrm{T}}{\Gamma} _{\mathrm{s}}^{-1}\!\left(\dot{\hat{W}}_{s}-\dot{\mathrm{W}}_{\mathrm{s}}\right) \end{align}

then

(44) \begin{align} \dot{V}& = \dot{x}_{m}^{T}\mathit{(}\hat{G}_{m} + u_{m} + F_{e} - G_{m} - D_{m}\mathit{)} - \dot{e}_{m}^{T}k_{m}e_{m} - \frac{k_{s1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2} - x_{1}^{2}}\nonumber\\ &\quad + z_{2}^{T}\!\left(\frac{z_{1}k_{c}^{2}}{k_{c}^{2} - x_{1}^{2}} + f_{s} - D_{s} - G_{s} - C_{s}a - M_{s}\!\left(q\right)\dot{a}\right) + tr\tilde{W}_{s}^{T}G_{s}^{ - 1}\dot{\hat{W}}_{s}\nonumber\\ & = \dot{x}_{m}^{T}\mathit{(}\hat{G}_{m} + u_{m} - G_{m} - D_{m}\mathit{)} + \dot{e}_{m}^{T}\mathit{(}F_{e} - k_{m}e_{m}\mathit{)} - \frac{k_{s1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2} - x_{1}^{2}}\nonumber\\ &\quad - k_{s2}z_{2}^{2} + z_{2}^{T}\!\left(u_{s} + \hat{W}^{T}S (Z) - D_{s} - G_{s} - C_{s}a - M_{s}\!\left(q\right)\dot{a}\right) + tr\tilde{W}_{s}^{T}G_{s}^{ - 1}\dot{\hat{W}}_{s}\nonumber\\ & = - d_{m}\dot{e}_{m}^{2} - \frac{k_{s1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2} - x_{1}^{2}} - k_{s2}z_{2}^{2} + \dot{x}_{m}^{T}\!\left(u_{m} - e_{m} - D_{m}\right)\nonumber\\ &\quad + z_{2}^{T}\!\left(u_{s} - D_{s} - e_{s}\right) + tr\tilde{W}_{s}^{T}\!\left(G_{s}^{ - 1}\dot{\hat{W}}_{s} + z_{2}^{T}S\!\left(Z\right)\right) \end{align}

Furthermore, the robust term in a controller, which is used to deal with estimating error, external disturbance, and modeling error, can be created as

(45) \begin{equation} \mathrm{u}_{m}=-\!\left(d_{m}+{\unicode[Times]{x025B}} _{mb}\right)sgn\!\left(\dot{x}_{m}\right) \end{equation}

where $\| \mathrm{D}_{m}\| \leq d_{\mathrm{m}}, \| {\unicode[Times]{x025B}} _{m}\| \leq {\unicode[Times]{x025B}} _{mb}, d_{\mathrm{m}}$ and ${\unicode[Times]{x025B}} _{mb}$ are positive constants.

(46) \begin{equation} \mathrm{u}_{\mathrm{s}}=-\!\left(d_{\mathrm{s}}+{\unicode[Times]{x025B}} _{sb}\right)\mathrm{sgn}\!\left(z_{2}\right) \end{equation}

where $\| D_{s}\| \leq d_{s}, \| {\unicode[Times]{x025B}} _{s}\| \leq {\unicode[Times]{x025B}} _{sb}$ $d_{s}$ and ${\unicode[Times]{x025B}} _{sb}$ are positive constants.

Thus, the adaptive law, which is used to estimate the RBF neural network parameters online and real time, can be designed as

(47) \begin{equation} \dot{\hat{W}}_{s}=-{\Gamma} _{s}\mathrm{S}\!\left(\mathrm{Z}\right)\mathrm{z}_{2}^{\mathrm{T}} \end{equation}

then

(48) \begin{align} \dot{\mathrm{V}} & \leq -\dot{e}_{m}^{T}D\dot{\mathrm{e}}_{m}-\left| \dot{\mathrm{x}}_{m}^{T}\right| \!\left(d_{m}+{\unicode[Times]{x025B}} _{mb}+\varepsilon _{m}+D_{m}\right)\nonumber\\ &\quad -\frac{k_{1}z_{1}^{2}k_{c}^{2}}{k_{c}^{2}-x_{1}^{2}}-K_{2}\mathrm{z}_{2}^{2}+\left| \mathrm{z}_{2}^{\mathrm{T}}\right| \!\left(d_{s}+{\unicode[Times]{x025B}} _{sb}+\varepsilon _{s}+D_{s}\right)\nonumber\\ &\leq 0 \end{align}

According to the Lyapunov stability theorem, the Lyapunov function is uniformly positive defined, its derivative is negative defined. The aforementioned formula ensures global stability and the convergence of global tracking in the system employing the suggested controller.

4. Experiment

The proposed method has been verified on two touch robots, which are haptic devices manufactured by 3D Systems. As shown in Fig. 2, the device has three degrees-of-freedom can be driven by a torque controller. In our experiments, two touch robots were served as master and slave. The dynamics of two touch robot can be modeled as (1) and (2), where the knowledge of the dynamic parameters is estimated by RBF neural network in the teleoperation process. The proposed method is tested by two groups of experiments:

Figure 2. Experimental setup for testing.

The test of controllers: The slave robot is performed to move along the learned trajectory to accomplish the task, and trajectory is modified through operating the master robot in remote environment by human operator.

The test of trajectory learning: The IBLF-based DMPs is used to generate a learned trajectory. Constraints of task space and velocity are considered.

4.1. Constrained control effect

In this part, we apply the designed controllers to verify work of the proposed algorithm. The experiments are performed on the two touch robots. First, a demonstration trajectory is given by operator, then the slave robot moves along the trajectory and the master robot is operated to modify the trajectory of the slave robot.

For the master robot, the parameters are chosen as $k_{min}=10, k_{\max }=100, d_{m}+\varepsilon _{mb}=0.1, D=1,$ ${\Gamma} _{m}=1, x_{md}=[50,0,60]$ is near the mid of its workspace. For the slave robot, control parameters are chosen as $k_{s1}=10, k_{c}=60$ and $k_{s2}=15, d_{s}+\varepsilon _{sb}=0.1, {\Gamma} _{s}=1$ . The slave robot receives the correction information from the master after filter and then applies the soft saturation function to generate the desired trajectory of the end effector in task space. Parameter of soft saturation function is designed as $\eta =0.98$ , the weight parameters of the RBF NN are initialized as 0, and the centers of the functions are distributed in the interval [0, 1]. The human operates the master device toward boundaries in the y-axes orderly.

The experiment results are shown in Fig. 3, our suggested controller guarantees that the end-effector tracks the reference trajectory in real time while operating inside the restricted area. Operator can feel the resistance when moving away from the hold position which prevent accidental contact. When the operator forced the slave to cross the bound of $y=60$ , the soft saturation function and IBLF controller work together to ensure the end-effector stay below the limit.

Figure 3. Results of remote corrective control.

4.2. Trajectory learning

The second group of experiments aims to test the ILBF-based motion model. The ability of constrains of task space and the velocity are tested. A drawing task is designed for the test. The common parameters of DMPs are chosen as $\alpha _{s}=5, \alpha _{z}=10, \beta _{z}=100$ and others are set separately in each experiment.

To validate the correction performance of the IBLF-based DMPs, the parameters of IBLF are $k_{1}=1, k_{2}=10, k_{c}=8$ . The speed constraint item is added to the IBLF-based method, and the selected parameter is $A=100, B=-100, \gamma =10, \gamma _{0}=5, \gamma _{1}=10$ .

As shown in Fig. 4, during the operation, the track was corrected by teleoperation equipment, and an obstacle was added in the early stage of the experiment to make the movement within the edge. The corrected trajectory will be learned then, we hope that the learned trajectory will not cross the limit where we get the information from correction.

Figure 4. Motion generation of different methods.

Figure 5. Motion generation of different methods.

Then the corrected trajectory is learned by a classic DMPs method and a modified DMPs method proposed in this article. As shown in Fig. 5, The classical DMPs method is compared with the IBLF-based method. In the top of the figure, the generalization process is shown, both can learn the characteristics of the trajectory, and finally approach the target point. However, it shows that the red line across the limit around $x=-50$ . While the bule line always stays within the constraints. It can be clear expression in the mid of the figure. It is the relationship between time and location. We can see that the red line over the border at $t=0.4$ . Compared with the classical method, IBLF-based method can constrain the motion within the set range. However, due to the few selected neural network nodes, the local features cannot be perfectly expressed. The bottom of the picture shows the speed constraint capability. The velocity oscillation at $t=0.3$ is a confrontation to prevent crossing the obstacle and the desired trajectory. Comparing to the red line which is the velocity of the classic DMPs, the blue one can always stays within the constraints. It can be seen that velocity is able to stay within the limits comparing with the classic DMPs.

5. Conclusion

In this article, an IBLF constrained DMPs has been designed to generate the trajectory under limits. Our proposed controller guaranteed the state avoid the obstacle and velocity follow the bound. A control system involving IBLF-based slave controller and impedance master controller has been applied, and dynamic uncertainties are approximated by the RBFNN learning method. The proposed controller guaranteed the constrained performance in task space and robustness of the controller. The effectiveness of the system has been verified on the touch robots experiment platform. In our future work, we will do further research on the full state constrain problem of constrained DMPs method and focus on varying constrained methods.

Author contributions

Qinchuan Li and Chenguang Yang conceived and designed the study. Donghao Shi conducted data gathering and performed statistical analyses. Donghao Shi and Zhenyu Lu wrote the article.

Financial support

The authors would like to thank the Key Research and Development Project of Zhejiang Province (Grant 2021C04017).

Competing interests

The authors declare no conflicts of interest exist.

Ethical approval

Not applicable.

References

Atkeson, C. G., Schaal, C. G. and Systems, A., “Learning from demonstration,” Robot. Auton. Syst. 47(2-3), 6567 (2004).Google Scholar
Schaal, S., Mohajerian, P. and Ijspeert, A. J. P. I. B. R., “Dynamics systems vs. optimal control - a unifying view,” Prog. Brain Res. 165, 425445 (2007).CrossRefGoogle ScholarPubMed
Tang, T., Lin, H. C., Zhao, Y., Fan, Y. and Tomizuka, M., “Teach Industrial Robots Peg-Hole-Insertion by Human Demonstration,” IEEE International Conference on Advanced Intelligent Mechatronics, (2016).CrossRefGoogle Scholar
Vogt, D., Stepputtis, S., Grehl, S., Jung, B. and Amor, H. B., A System for Learning Continuous Human-Robot Interactions from Human-Human Demonstrations, 2017 IEEE International Conference on Robotics and Automation (ICRA), (2017).CrossRefGoogle Scholar
Lioutikov, R., Neumann, G., Maeda, G., Peters, J. and IEEE, Probabilistic Segmentation Applied to an Assembly Task, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (IEEE-RAS International Conference on Humanoid Robots), (2015) pp. 533540.Google Scholar
Moro, C., Nejat, G. and Mihailidis, A., “Learning and personalizing socially assistive robot behaviors to aid with activities of daily living,” ACM Trans. Human-Robot Interact. 7(2), 125 (2018).CrossRefGoogle Scholar
Xu, W., Chen, J., Lau, H. Y. K. and Ren, H., “Automate surgical tasks for a flexible serpentine manipulator via learning actuation space trajectory from demonstration,” IEEE Int. Conf. Robot. Autom., 44064413 (2016).Google Scholar
Osa, T., Harada, K., Sugita, N., Mitsuishi, M. and IEEE, Trajectory Planning under Different Initial Conditions for Surgical Task Automation by Learning from Demonstration, 2014 IEEE International Conference on Robotics and Automation (IEEE International Conference on Robotics and Automation ICRA), (2014) pp. 65076513.Google Scholar
Gams, A., Nemec, B., Ijspeert, A. J. and Ude, A., “Coupling movement primitives: Interaction with the environment and bimanual tasks,” IEEE Trans. Robot. 30(4), 816830 (2014).CrossRefGoogle Scholar
Argall, B. D., Chernova, S., Veloso, M. and Browning, B., “A survey of robot learning from demonstration,” Robot. Auton. Syst. 57(5), 469483 (2009).CrossRefGoogle Scholar
Losey, D. P. and O’Malley, M. K.. Learning the Correct Robot Trajectory in Real-Time From Physical Human Interactions, vol. 27. ACM, New York, NY, USA, (2019).Google Scholar
Nemec, B., Zlajpah, L., Slajpa, S., Piskur, J. and Ude, A., An Efficient PBD Framework for Fast Deployment of Bi-Manual Assembly Tasks, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), (2018).CrossRefGoogle Scholar
Hagenow, M., Senft, E., Radwin, R., Gleicher, M. and Zinn, M., “Corrective shared autonomy for addressing task variability,” IEEE Robot. Autom. Lett. 6(2), 11 (2021).CrossRefGoogle ScholarPubMed
Sheridan, T. B. J. M. P., Telerobotics, automation, and human supervisory control, (1992).Google Scholar
Si, W. Y., Guan, Y. and Wang, N., “Adaptive compliant skill learning for contact-rich manipulation with human in the loop,” IEEE Robot. Autom. Lett. 7(3), 58345841 (2022).CrossRefGoogle Scholar
Yang, C., Huang, D., He, W., Cheng, L. J. I. T. O. N. N. and Systems, L., “Neural control of robot manipulators with trajectory tracking constraints and input saturation,” IEEE Trans. Neural Netw. Learn. Syst. 99, 112 (2020).Google Scholar
Tee, K. P., Ge, S. S. and Tay, E. H., “Barrier Lyapunov functions for the control of output-constrained nonlinear systems,” Automatica 45(4), 918927 (2009).CrossRefGoogle Scholar
Jin, X., “Fault tolerant finite-time leader follower formation control for autonomous surface vessels with LOS range and angle constraints,” Automatica 68, 228236 (2016).CrossRefGoogle Scholar
Wei, H., Shuang, Z. and Ge, S. S. J. I. T. O. I. E., “Adaptive control of a flexible crane system with the boundary output constraint,” IEEE Trans. Ind. Electron. 61(8), 41264133 (2014).Google Scholar
He, W., Xue, C., Yu, X., Li, Z. and Yang, C. J. I. T. O. A. S., “Admittance-based controller design for physical human-robot interaction in the constrained task space,” IEEE Trans. Autom. Sci. Eng. 99, 113 (2020).Google Scholar
Calinon, S., D’Halluin, F., Sauser, E. L., Caldwell, D. G., Billard, A. G. J. R. and A. M. IEEE, “Learning and reproduction of gestures by imitation,” IEEE Robot. Autom. Mag. 17(2), 4454 (2010).CrossRefGoogle Scholar
Lu, Z., Wang, N. and Yang, C. J. I. A. T. o. M., “A constrained DMPs framework for robot skills learning and generalization from human demonstrations,” IEEE/ASME Trans. Mech. 99, 1 (2021).Google Scholar
Si, W., Wang, N. and Yang, C. J. N. C., “Composite dynamic movement primitives based on neural networks for human-robot skill transfer,Neural Comput. Appl. 5, 111 (2021).Google Scholar
Chen, Z., Huang, F., Sun, W., Gu, J. and Yao, B. J. I. A. T. o. M., “RBF neural network based adaptive robust control for nonlinear bilateral teleoperation manipulators with uncertainty and time delay,” IEEE/ASME Trans. Mech. 99, 1 (2019).Google Scholar
Figure 0

Figure 1. Block diagram illustrating the constrained system.

Figure 1

Figure 2. Experimental setup for testing.

Figure 2

Figure 3. Results of remote corrective control.

Figure 3

Figure 4. Motion generation of different methods.

Figure 4

Figure 5. Motion generation of different methods.