1. Introduction
Robots are now widely used in many fields, from industrial manufacturing to daily life. They are not limited to structured environments and single, repetitive tasks. To reduce the programming effort for different tasks, learning from demonstrations [Reference Atkeson, Schaal and Systems1] is proposed. The dynamic movement primitives (DMPs) [Reference Schaal, Mohajerian and Ijspeert2] method is a flexible and effective way to transfer manipulation skills from humans to robots easily. DMPs can generalize the actions learning from demonstration and guarantee convergence to a goal position. The method has been successfully applied in many robotic scenarios, such as assembly operations [Reference Tang, Lin, Zhao, Fan and Tomizuka3–Reference Lioutikov, Neumann, Maeda and Peters5], robotic surgery [Reference Moro, Nejat and Mihailidis6–Reference Osa, Harada, Sugita and Mitsuishi8], and collaborative bimanual tasks [Reference Gams, Nemec, Ijspeert and Ude9].
However, there are many factors cause task failures using DMPs [Reference Argall, Chernova, Veloso and Browning10] including: (1) the export cannot demonstrate every correct action for all the possible states. (2) The environment of the tasks may be variable in actual (e.g., new obstacles and constrains of manipulator). To solve these questions, some scholars introduced human supervision as a part of the system. In ref. [Reference Losey and O’Malley11], Losey and O’Malley introduced the kinematic adjustments to successfully deduce parameters of an optimal policy. In ref. [Reference Nemec, Zlajpah, Slajpa, Piskur and Ude12], Nemec et al. proposed a learning from demonstration framework where DMPs based on kinematic corrections to the behavior of an impedance-controlled robot. In ref. [Reference Hagenow, Senft, Radwin, Gleicher and Zinn13], Hagenow et al. proposed corrective shared autonomy system, corrective shared autonomy is introduced to leverage user input to address uncertainty in robot tasks by targeting corrections to task-specific variables. Hagenow used the method of supervisory control [Reference Sheridan14], which is a kind of remote control methods to remain the responsibility of the human operator. Rather than correcting the robot directly, the remote-based approach is safer and space-free [Reference Si, Guan and Wang15]. In this article, corrections of operator are also introduced by a remote control system.
The correction of expert improves the quality of task completion, but it also raises new questions: (1) correction imposed by supervisor may cause manipulator over its constraint space and (2) and the correction of trajectory cannot be unable to discern the intent. One effective way to deal with constraint problems is barrier Lyapunov function (BLF). In ref. [Reference Yang, Huang, He, Cheng and Systems16], an asymmetric time-varying BLF is used to guarantee the time-varying output constraints. In contrast to the log [Reference Tee, Ge and Tay17] and tan [Reference Jin18] type BLF, integral BLF can limit state signals directly, rather than error signals [Reference Wei, Shuang and Ge19]. Integral barrier Lyapunov function (IBLF) [Reference He, Xue, Yu, Li and Yang20] is proposed to guarantee the end-effector of the robot in the constrained task space. As a result, IBLF is utilized in this article to ensure the manipulator’s end-effector in the restricted task space following the repair of export.
Then correction is divided into two kinds by supervisor: improve the quality of demonstrations and avoid the collision in complex environment. The corrected motion can be used as a new quality demonstration. Collision information can be got from the correction. In ref. [Reference Calinon, D’Halluin, Sauser, Caldwell and Billard21], constraints for DMPs have been successfully treated as point-like obstacles and volumetric obstacles. Based on DMPs model and BLFs, Lu et al. [Reference Lu, Wang and Yang22] propose a BLF-based DMPs framework with the classified constraints.
In this article, the proposed framework is illustrated in Fig. 1, which includes motion generation and single execution. Motion generation uses classic DMPs method with velocity limits inspired by ref. [Reference Si, Wang and Yang23] to learn the corrected path. Single execution part designs a remote control system to ensure the end-effector of manipulator following the learned trajectory, where radial basis function neural network (RBFNN) is employed to approximate the unknown robot dynamics. The trajectory is modified by the correction information of export and the confined space.
The following is a list of the major contributions:
A novel framework based on supervision is proposed that considers both the motion learning and task executing, including a modified DMPs method and a remote-control system for supervision.
A new framework combining DMPs and IBLF is proposed to solve constrained trajectory planning problem based on the correction. Velocity limits are also met.
A modified remote-control system is introduced, and constrains of system are limited via the IBLF. Furthermore, the stability of the system can be ensured using Lyapunov stability theorem.
The following is the list of paper that is organized as Section 2 introduces basic information of the DMPs and IBLF. The learning process of the corrected motion and remote corrective control system are introduced in Section 3. The experiments are presented in Section 4. Section 5 concludes this paper.
2. Preliminary work
2.1. Dynamics modeling of remote system
The dynamics of the teleoperation system for master and slave in the task space can be modeled as follows:
where $x_{m}, \dot{x}_{m}, \ddot{x}_{m}$ and $x_{s}, \dot{x}_{s}, \ddot{x}_{s}$ are the position, velocity, and acceleration signals of end-effort for master and slave manipulators, $M_{m}(x_{m})$ and $M_{s}(x_{s})$ are the inertia matrices, $C_{m}(x_{m},\dot{x}_{m})$ and $C_{s}(x_{s},\dot{x}_{s})$ account centrifugal/Coriolis terms, $G_{m}(x_{m})$ and $G_{s}(x_{s})$ are the gravitational matrices, $D_{m}$ and $D_{s}$ are the modeling errors and external disturbance, and $f_{m}$ and $f_{s}$ are the control input of master and slave devices.
Property 1: The matrices $\dot{M}_{m} - 2C_{m}$ and $\dot{M}_{s} - 2C_{s}$ are skew-symmetric.
2.2. RBFNN
RBFNN is used to approximation manipulator uncertainties to handle the uncertainty issue in the dynamic model. The following introduces the definition of the RBF neural network utilized in this article:
where $\vartheta$ denotes the input of the neural network, $W=[\omega _{1},\omega _{2},\ldots \omega _{n}]^{T}$ is the ideal weight parameter, $n$ is the number of RBFNN nodes, and $\varepsilon (\vartheta )$ is the approximation errors. $S(\vartheta )=[s_{1}(\vartheta ),s_{2}(\vartheta ),\ldots s_{n}(\vartheta )]^{T}$ is the Gaussian basis function in the form as
where $\mathrm{c}_{i}, \mathrm{b}_{i}$ are the center and width of the neuron. The ideal weight vector $W^{*}$ is an artificial quantity for the purposed method, which aims to minimize the value of $W$ .
2.3. General DMPs model
DMPs is used to trajectory learning. DMPs consists of two main components: (1) spring-damper-type equation which draws our system to the target and (2) a forcing term that gets the desire behaviors.
The DMPs model is first introduced as
where $\alpha _{z},$ $\beta _{z}$ are the positive parameters, $x$ is the position variable, $g$ is the goal point, $v$ is the velocity, $\dot{v}$ is the acceleration, $\tau \gt 0$ is the time constant, $s$ is a phase variable that avoid explicit time dependence, the initial value of $s$ is set as 1, and $\alpha _{s}$ is the factor to modify the converging time
The forcing term $f(s)$ is defined as
The forcing term has the components that consist of N Gaussian basis functions, enable the encoding of demonstrated trajectory, where $y_{0}$ is the starting position state, $w_{i}$ is the column of the weight vector, $\psi _{i}$ is Gaussian radial basis function, where $\psi _{i}(s)=\exp\! ({-}h_{i}(\mathrm{s}-\mathrm{c}_{i})^{2}), \mathrm{c}_{i}$ is the center of Gaussian kernels and $h_{i}$ is the variance. The vector $w_{i}$ can be trained with supervised learning algorithms such as locally weighted regression.
The calculating process is proposed to minimize the error function as following:
where $f(s)$ is an item calculated by the trajectory in demonstration, and $f^{t}(s)$ represents the target value as following:
2.4. General IBLF
Consider the strict feedback nonlinear system described as
where $f_{1},f_{2},g_{1},g_{2}$ are smooth functions, $x_{1},x_{2}$ are the states, and $u$ and $y$ are the input and output.
Introduce the IBLF candidate in the following:
where $k_{c}$ is the constant, $x_{r}$ is the variable, $\rho$ is a member of integrating, and the error variable $z_{1}= x_{1}-x_{r}, z_{2}= x_{2}-\alpha, \alpha$ is a continuously differentiable function.
The time derivative of $V_{1}$ is given by
where
The virtual control variable $\alpha$ can be designed as follows:
where $k_{1}$ is positive constant. Substituting (15) into (12), we can get
Then, a Lyapunov function candidate is chosen as follows:
The time derivative of $V_{2}$ is given by
The control law is designed as
3. Proposed framework
A novel approach that combines DMPs and IBLF is introduced to address limited trajectory planning based on the correction. The remote-control system is made to guarantee that the manipulator end-effector is within the restricted space and to offer export corrective information. The controller of master and salve is designed and analyzed.
3.1. DMPs based on IBLF
The expression of DMPs as (6) can be revised as nonlinear system as follows:
where $x_{1},x_{2},g_{1}$ represent the $x,v,1/\tau$ in Eq. (5), $f_{2}=(\alpha _{z}(\beta _{z}(g-x)-v)+\gamma \sigma (v))/\tau, u$ is the forcing function $f(s), \Delta u$ is a term added by IBLF, $\gamma, \gamma _{0}, \gamma _{1}$ are the positive constants, to allow the velocity being close to the limit while still not exceeding it, the $\sigma _{i}$ is designed as
where $A_{i}$ , $B_{i}$ are the positive constants.
Similar to the general form of IBLF, we define $z_{1}=x_{1}-x_{r}, z_{2}=x_{2}-\alpha, x_{r}$ is the desired state in control system, while there is no such state in DMPs, so here we introduce the motion generated by DMPs method without any limits as the desired state:
Theorem 1: The output constraint is never violated, and all closed loop signals are confined if the following conditions are met for the DMPs function represented by (20).
where $\lambda$ is introduced in (14)
To get the form of $\Delta u$ , we calculate $u$ without added term is
So
Proof:
We synthesize a Lyapunov function as
where $V_{1}$ is the IBLF candidate in (11)
Then
Taking the expressions of (16) into (20), we have
where $k_{1},k_{2}$ are positive numbers. according to Lyapunov stability theorem, we know that above expression guarantees global stability and the global tracking convergence in the system.
3.2. Master controller design
The control command on the master robot is designed as an impedance controller, such that the position of the robot end effector can be moved by export, let us consider the modeling of the operating torque. In this paper, a damping-stiffness model is considered.
where $k_{m}$ and $d_{m}$ denote designed damping and stiffness matrices, where $e_{m}=x_{md}-x_{m}, x_{md}$ is the desired state, in this remote corrective system, it is fixed value. For avoiding the slave robot moving off the edge of the surface during control, a variable stiffness is proposed
where $\alpha$ is the proximity of distance to boundary, closer get to the boundary, the closer get to 1, where $d(x_{m})$ is the distance to the nearest edge, $k_{\textit{max} }, k_{\textit{min} }$ are positive numbers.
Then, the control torque can be designed as
where $u_{m}$ is robust term, and ${\mathop{G}\limits^\frown}_{m}$ is the estimation of $G$ , which satisfies: $G-\hat{\mathrm{G}}_{m}=\varepsilon _{m}$ .
3.3. Slave controller design
Inspired by ref. [Reference Chen, Huang, Sun, Gu and Yao24], the desired position signal $x_{sd}$ can be derived via the slave trajectory creator. For simplicity, a filter as
It is used with the input of $x_{m}(t - T(t))$ to create the correction state $x_{sc}$ , then $x_{sd}=\Delta x_{sd}+x_{sdmp}$ , where $\Delta \mathrm{x}_{\mathrm{sd}}=x_{sc}-x_{sd}$ is the forward kinematic function of the Touch $x_{sdmp}$ is the path generate by DMPs.
Due to the disturb and errors in tracking, the trajectory generated by creator may not guarantee the $x$ in the constraint space all the time, we can obtain $x$ by a soft saturation function, then if the x cross the bound it will map the inside state. We derive the following via a saturation function to guarantee that the reference trajectory stays inside the limited region:
where $\eta$ is a constant very close to 1.
Because of the uncertainity of robot dynamic and model, reference trajectory cannot ensure the end-effector stay in the constrained space. IBLF method is introduced to ensure the constrains of the predefined task space met.
We can obtain the dynamics system of slave
Then, we define $z_{1}=x_{1}-x_{sd}, z_{2}=x_{2}-\alpha$ , where $z_{1}, z_{2}$ are error variables, and $\alpha$ denotes the virtual control variable, we design it as
where $k_{s1}$ is the positive number, $k_{c}$ is the limit in Cartesian space, and $\lambda$ is given in (14)
The control law is designed as
In the actual system, the dynamics parameters and are typically unknown. RBFNN is used to approximation manipulator uncertainties to handle the uncertainty issue in the dynamic model. The neural network’s input on the slave side can be chosen as $\mathrm{Z}=[\mathrm{x}_{\mathrm{s}},\dot{\mathrm{x}}_{\mathrm{s}},{\alpha},\dot{{\alpha} }]$ . RBF neural network is defined as
3.4. Stability analysis
Define the Lyapunov function of master and salve as
where $\tilde{W}=\hat{W}-W$
The Lyapunov candidate function of whole system as
Then, the derivative of $\mathrm{V}_{m}$ can be calculated as
Substituting (31) (36) into (42), we have
then
Furthermore, the robust term in a controller, which is used to deal with estimating error, external disturbance, and modeling error, can be created as
where $\| \mathrm{D}_{m}\| \leq d_{\mathrm{m}}, \| {\unicode[Times]{x025B}} _{m}\| \leq {\unicode[Times]{x025B}} _{mb}, d_{\mathrm{m}}$ and ${\unicode[Times]{x025B}} _{mb}$ are positive constants.
where $\| D_{s}\| \leq d_{s}, \| {\unicode[Times]{x025B}} _{s}\| \leq {\unicode[Times]{x025B}} _{sb}$ $d_{s}$ and ${\unicode[Times]{x025B}} _{sb}$ are positive constants.
Thus, the adaptive law, which is used to estimate the RBF neural network parameters online and real time, can be designed as
then
According to the Lyapunov stability theorem, the Lyapunov function is uniformly positive defined, its derivative is negative defined. The aforementioned formula ensures global stability and the convergence of global tracking in the system employing the suggested controller.
4. Experiment
The proposed method has been verified on two touch robots, which are haptic devices manufactured by 3D Systems. As shown in Fig. 2, the device has three degrees-of-freedom can be driven by a torque controller. In our experiments, two touch robots were served as master and slave. The dynamics of two touch robot can be modeled as (1) and (2), where the knowledge of the dynamic parameters is estimated by RBF neural network in the teleoperation process. The proposed method is tested by two groups of experiments:
The test of controllers: The slave robot is performed to move along the learned trajectory to accomplish the task, and trajectory is modified through operating the master robot in remote environment by human operator.
The test of trajectory learning: The IBLF-based DMPs is used to generate a learned trajectory. Constraints of task space and velocity are considered.
4.1. Constrained control effect
In this part, we apply the designed controllers to verify work of the proposed algorithm. The experiments are performed on the two touch robots. First, a demonstration trajectory is given by operator, then the slave robot moves along the trajectory and the master robot is operated to modify the trajectory of the slave robot.
For the master robot, the parameters are chosen as $k_{min}=10, k_{\max }=100, d_{m}+\varepsilon _{mb}=0.1, D=1,$ ${\Gamma} _{m}=1, x_{md}=[50,0,60]$ is near the mid of its workspace. For the slave robot, control parameters are chosen as $k_{s1}=10, k_{c}=60$ and $k_{s2}=15, d_{s}+\varepsilon _{sb}=0.1, {\Gamma} _{s}=1$ . The slave robot receives the correction information from the master after filter and then applies the soft saturation function to generate the desired trajectory of the end effector in task space. Parameter of soft saturation function is designed as $\eta =0.98$ , the weight parameters of the RBF NN are initialized as 0, and the centers of the functions are distributed in the interval [0, 1]. The human operates the master device toward boundaries in the y-axes orderly.
The experiment results are shown in Fig. 3, our suggested controller guarantees that the end-effector tracks the reference trajectory in real time while operating inside the restricted area. Operator can feel the resistance when moving away from the hold position which prevent accidental contact. When the operator forced the slave to cross the bound of $y=60$ , the soft saturation function and IBLF controller work together to ensure the end-effector stay below the limit.
4.2. Trajectory learning
The second group of experiments aims to test the ILBF-based motion model. The ability of constrains of task space and the velocity are tested. A drawing task is designed for the test. The common parameters of DMPs are chosen as $\alpha _{s}=5, \alpha _{z}=10, \beta _{z}=100$ and others are set separately in each experiment.
To validate the correction performance of the IBLF-based DMPs, the parameters of IBLF are $k_{1}=1, k_{2}=10, k_{c}=8$ . The speed constraint item is added to the IBLF-based method, and the selected parameter is $A=100, B=-100, \gamma =10, \gamma _{0}=5, \gamma _{1}=10$ .
As shown in Fig. 4, during the operation, the track was corrected by teleoperation equipment, and an obstacle was added in the early stage of the experiment to make the movement within the edge. The corrected trajectory will be learned then, we hope that the learned trajectory will not cross the limit where we get the information from correction.
Then the corrected trajectory is learned by a classic DMPs method and a modified DMPs method proposed in this article. As shown in Fig. 5, The classical DMPs method is compared with the IBLF-based method. In the top of the figure, the generalization process is shown, both can learn the characteristics of the trajectory, and finally approach the target point. However, it shows that the red line across the limit around $x=-50$ . While the bule line always stays within the constraints. It can be clear expression in the mid of the figure. It is the relationship between time and location. We can see that the red line over the border at $t=0.4$ . Compared with the classical method, IBLF-based method can constrain the motion within the set range. However, due to the few selected neural network nodes, the local features cannot be perfectly expressed. The bottom of the picture shows the speed constraint capability. The velocity oscillation at $t=0.3$ is a confrontation to prevent crossing the obstacle and the desired trajectory. Comparing to the red line which is the velocity of the classic DMPs, the blue one can always stays within the constraints. It can be seen that velocity is able to stay within the limits comparing with the classic DMPs.
5. Conclusion
In this article, an IBLF constrained DMPs has been designed to generate the trajectory under limits. Our proposed controller guaranteed the state avoid the obstacle and velocity follow the bound. A control system involving IBLF-based slave controller and impedance master controller has been applied, and dynamic uncertainties are approximated by the RBFNN learning method. The proposed controller guaranteed the constrained performance in task space and robustness of the controller. The effectiveness of the system has been verified on the touch robots experiment platform. In our future work, we will do further research on the full state constrain problem of constrained DMPs method and focus on varying constrained methods.
Author contributions
Qinchuan Li and Chenguang Yang conceived and designed the study. Donghao Shi conducted data gathering and performed statistical analyses. Donghao Shi and Zhenyu Lu wrote the article.
Financial support
The authors would like to thank the Key Research and Development Project of Zhejiang Province (Grant 2021C04017).
Competing interests
The authors declare no conflicts of interest exist.
Ethical approval
Not applicable.