A reinforcement learning fuzzy system for continuous control in robotic odor plume tracking

Xinxing Chen; Bo Yang; Jian Huang; Yuquan Leng; Chenglong Fu

doi:10.1017/S0263574722001321

A reinforcement learning fuzzy system for continuous control in robotic odor plume tracking

Published online by Cambridge University Press: 19 September 2022

Yuquan Leng and

Xinxing Chen: Affiliation:
Shenzhen Key Laboratory of Biomimetic Robotics and Intelligent Systems, Shenzhen, 518055, China Guangdong Provincial Key Laboratory of Human-Augmentation and Rehabilitation Robotics in Universities, Southern University of Science and Technology, Shenzhen, 518055, China
Bo Yang: Affiliation:
Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Jian Huang: Affiliation:
Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Yuquan Leng: Affiliation:
Shenzhen Key Laboratory of Biomimetic Robotics and Intelligent Systems, Shenzhen, 518055, China Guangdong Provincial Key Laboratory of Human-Augmentation and Rehabilitation Robotics in Universities, Southern University of Science and Technology, Shenzhen, 518055, China
Chenglong Fu*: Affiliation:
Shenzhen Key Laboratory of Biomimetic Robotics and Intelligent Systems, Shenzhen, 518055, China Guangdong Provincial Key Laboratory of Human-Augmentation and Rehabilitation Robotics in Universities, Southern University of Science and Technology, Shenzhen, 518055, China
*: *Corresponding author. E-mail: fucl@sustech.edu.cn

Article contents

Abstract
Introduction
Methods
Application in odor plume tracking
Performance evaluation in simulation
Experiments
Conclusions
Author contributions
Financial support
Conflicts of interest
References

Rights & Permissions

Abstract

In dynamic outdoor environments characterized by turbulent airflow and intermittent odor plumes, robotic odor plume tracking remains challenging, because existing algorithms heavily rely on manually tuning or learning from expert experience, which are hard to implement in an unknown environment. In this paper, a multi-continuous-output Takagi–Sugeno–Kang fuzzy system was designed and tuned with reinforcement learning to solve the robotic odor source localization problem in dynamic odor plumes. Based on the Lévy Taxis plume tracking controller, the proposed fuzzy system determined the parameters of the controller based on the robot’s observation and guided the robot to turn and move towards the odor source at each searching step. The trained fuzzy system was tested in simulated filament-based odor plumes dispersed by a changing wind field. The results showed that the performance of the proposed fuzzy system-based controller trained with reinforcement learning can achieve a similar success rate and higher efficiency compared with a manually tuned and well-designed fuzzy system-based controller. The fuzzy system-based plume tracking controller was also validated through real robotic experiments.

Keywords

reinforcement learning fuzzy inference system odor plume tracking robotic olfaction dynamic airflow

Type: Research Article
Information: Robotica , Volume 41 , Issue 3 , March 2023 , pp. 1039 - 1054

DOI: https://doi.org/10.1017/S0263574722001321 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

1. Introduction

Tracking odor/gas plumes and searching the release source with mobile robots show advantages in various scenarios, for example, searching for toxic gas leakage. Compared to professional firemen and searching animals, robots do not require much training, can keep working for a long period, and would be less threatened by dangerous surroundings [Reference Chen and Huang1–Reference Ma, Mao, Tan, Gao, Zhang and Xie3].

Robotic odor source localization utilizes a mobile robot or a team of robots to search for the odor release source. The robots usually integrate an odor concentration sensor and a wind sensor to perceive the environment and carry out some bio-inspired or information-gathering behaviors, including chemotaxis (climbing the odor concentration gradient) [Reference Larsch, Flavell, Liu, Gordus, Albrecht and Bargmann4], anemotaxis (moving upwind if odor plumes are detected) [Reference Chen and Huang5], and Infotaxis [Reference Vergassola, Villermaux and Shraiman6, Reference Chen, Marjovi, Huang and Martinoli7].

Conventional odor source searching behaviors showed their capability in laminar and steady airflows, because the odor concentration is of a smooth gradient and can be relatively easily modeled as a pseudo-Gaussian distribution [Reference Arya8]. However, the odor concentration distribution in turbulent airflows can hardly be calculated analytically. It is hard to follow a smooth concentration gradient to locate the odor source. The robot is very likely to lose the plumes and wander in the searching area.

In recent years, thanks to the development of hardware with higher computational capabilities, some fuzzy logic control methods [Reference Chen and Huang9–Reference Wang, Pang and Li11] and learning-based methods [Reference Wang, Pang and Li12–Reference Hu, Song and Chen14] have been proposed for odor source localization in complex and dynamic environments. Fuzzy logic control fuzzifies the measured signals with preset rules and is flexible for various application scenarios. However, the performance of a fuzzy controller heavily depends on its rules, which are hard to tune to an optimal manually. Adaptive Neural Fuzzy Inference System can be utilized to tune the behavior rules by learning from preset searching strategies [Reference Wang and Pang10]. However, the performance of the present searching strategies cannot be guaranteed in case the robot works in an unknown environment where prior knowledge of the odor concentration and wind field is not available.

Existing reinforcement learning-based methods use neural networks to model an action-value function and let the robot work under an optimal action policy. Compared with fuzzy logic rules which are set manually or learned from preset behaviors, the well-trained models through deep reinforcement learning are better adaptive to unknown and complex environments. However, the interpretability of deep reinforcement learning remains a challenging problem. Learning models are essentially a black box because it is difficult to explain the acquired knowledge during the training process. The interpretability of learning models has attracted a rapidly growing research interest in the past few years. Some researchers adopted elaborately designed methods to explain existing learning algorithms. Some others tried to linguistically explain the knowledge a model has acquired during training. Takagi–Sugeno–Kang (TSK) fuzzy systems were a good alternative to neural networks for this purpose. Compared with neural networks [Reference Chen, Zhang, Leng, Chen and Fu15–Reference Fang, Long, Sun, Liu, Zhang and Fang18], fuzzy rules are more close to the human decision-making methodology [Reference Li, Cao and Ding19, Reference Yang, Jiang, Na, Li, Cheng and Su20] and can be even initialized with human’s prior knowledge and make the training process faster [Reference Chen, Leng and Fu21]. In this case, human experts can provide proactive interventions to the design and tuning process of TSK fuzzy systems, which is an advantage of TSK fuzzy systems to other neural network models. Thanks to the flexibility of the TSK fuzzy systems, the fuzzy system-based controllers have been widely applied in robotics [Reference Salehi, Pishkenari and Zohoor22–Reference Veysi, Soltanpour and Khooban26].

To the best of our knowledge, no previous work used a TSK fuzzy system to model an action policy for robotic odor plume tracking and tuned it with reinforcement learning. Since the TSK fuzzy system may have some issues on weak generalization ability, poor training effect for big data, and low convergence rate, some optimization methods [Reference Wu, Yuan, Huang and Tan27] including Layer Normalization (LN) and DropRule are applied in the structure of the fuzzy system and the training process. Moreover, the outputs of the proposed fuzzy system include multiple continuous variables, which is different from widely used discrete action spaces in reinforcement [Reference Chen, Fu and Huang13]. This feature can make the fuzzy system-based controller more adaptive and speed up the odor source searching process in real-world scenarios.

The proposed reinforcement learning fuzzy system is also promising to be applied to other robotic problems, for example, human-robot interactions [Reference Su, Qi, Schmirander, Ovur, Cai and Xiong28–Reference Fang, Ding, Sun, Shan, Wang, Wang and Zhang31], environment adaptation of robots [Reference Zhang, Luo, Xiao, Zhang, Liu, Zhu, Lu, Rong, de Silva and Fu32, Reference Chen, Chen, Wang, Yang, Ma, Leng and Fu33], calibration [Reference Guo, Song, Tang, Zhou and Jiang34–Reference Guo, Tang, Zhou, Song, Jiang, Xie and Ye36], path planning and control [Reference Cao, Huang, Xiong, Wu, Zhang, Li and Hasegawa37–Reference Fang, Sun, Wu, Liu, Wang, Huang, Huang, Liu and Wen40]. The fuzzy system-based controllers can be initialized with expert knowledge, trained with reinforcement learning in simulated scenarios, and finally adapted to the real world.

The contributions of this paper are threefold:

(1) A multi-continuous-output TSK fuzzy system is designed and integrated into the framework of reinforcement learning to model the action policy of the robot. The structure of the fuzzy system and the training process is optimized to achieve a faster training process and robust performance.
(2) The proposed fuzzy system is applied for odor plume tracking control in dynamic airflow. The influence of the reward settings on the trained Multi-Continuous-Output TSK (MCOTSK)-based controller is investigated.
(3) The performance of the proposed odor plume tracking method is compared with a benchmark method and the results are analyzed to investigate how reinforcement learning can promote the searching performance.

The following sections of the paper are organized as follows: Section 2 presents the structure of the proposed multi-continuous-output fuzzy inference system and how reinforcement learning is utilized to tune the system. Section 3 presents the filament-based dynamic plumes and how the proposed system is trained and applied in the odor plume tracking task. Section 4 compares the proposed MCOTSK-based controller trained with two different reward settings in a simulated large-dimension scenario with dynamic plumes and analyzes the results. The proposed method is also compared with a benchmark method. Section 5 validates the trained controller on a real robot in odor plume tracking tasks. Section 6 concludes the paper.

2. Methods

2.1. General TSK fuzzy system structure

TSK fuzzy systems are widely used machine learning models for regression problems [Reference Wu, Yuan, Huang and Tan27, Reference Nguyen, Taniguchi, Eciolaza, Campos, Palhares and Sugeno41]. It maps the relationships between inputs and outputs through the fuzzy logic theory. It does not require prior expert knowledge to set the parameters of the system, but applying learning algorithms to tune the parameters, for example, evolutionary algorithms [Reference Wu and Tan42] and gradient descent [Reference Wang and Mendel43]. Figure 1 shows a five-layer TSK fuzzy system architecture.

Figure 1. The architecture of the five-layer TSK fuzzy system. Each layer contains one type of nodes – adaptive-parameter nodes, represented by squares, and fixed-parameter nodes, represented by circles. The parameters of the adaptive-parameter nodes can be tuned, and those of the fixed-parameter nodes cannot be tuned.

Assume the input vector of a TSK fuzzy system can be expressed as $\mathbf{x}=\left (x_{1}, \ldots, x_{M}\right )^{T} \in \mathbb{R}^{M \times 1}$ . The input is fuzzified by $R$ rules:

(1)

\begin{align} \text{Rule}\,r\,:\, &\text{IF}\,x_1\,\text{is}\,A_{r,1},\,x_2\,\text{is}\,A_{r,2},\ldots,\,\text{and}\,x_M\,\text{is}\,A_{r,m},\nonumber \\ &\text{THEN}\,y_r(\mathbf{x})= b_{r,0}+\sum _{m=1}^{M}b_{r,m}x_m,\quad (r = 1,\ldots,R), \end{align}

where $A_{r,m} (r = 1,\ldots,R;\,m = 1,\ldots,M)$ are fuzzy sets, $y_r(\mathbf{x})$ is the output of Rule $r$ , and $b_{r,0}$ and $b_{r,m}$ are the weight parameters.

The first layer of the TSK fuzzy system is the fuzzification layer, of which the output can be expressed as:

(2)

\begin{equation} \theta _{r,m}^1=\mu _{A_{r,m}}(x_m)=e^{-\frac{(x_m-c_{r,m})^2}{2a_{r,m}^2}}, \end{equation}

where $\mu _{A_{r,m}}$ are the membership functions (MFs) of the fuzzy sets $A_{r,m} (r = 1,\ldots,R;\,m = 1,\ldots,M)$ and are set to be Gaussian MFs, because they are widely used and their derivatives are easier to compute. $a_{r,m}$ and $c_{r,m}$ are parameters related to the shape of the Gaussian MFs.

In the second layer, all nodes are fixed and marked as $\pi$ . In order to draw conclusions from a set of rules defined for a TSK fuzzy system, the strength of the premise of each rule, referred to as “firing strength” of the premise, is calculated in this layer given a set of input values $\left (x_{1}, \ldots, x_{M}\right )^{T}$ and their membership grade $\mu _{A_{r,m}}(x_m)$ to each fuzzy set $ A_{r,m} (r = 1,\ldots,R;\,m = 1,\ldots,M)$ . The outputs of this layer are the products of the inputs and can be expressed with Eq. (3). For a certain fuzzy rule, if the membership grade of one of the inputs is close to zero, which means this rule can be hardly satisfied, the multiplication layer can ensure that the product of all the membership grades is also close to zero, so that this rule is almost inactive in the following layers and will not make much sense for decision-making.

(3)

\begin{equation} \theta _r^2 = \prod _{m=1}^{M}\mu _{A_{r,m}}(x_m)\,(r = 1,\ldots,R). \end{equation}

The third layer is the normalization layer. The outputs of this layer are the normalization of the input signals and represent the contribution of Rule $r$ to the sum of the firing strength of all rules:

(4)

\begin{equation} \theta _r^3 = \frac{\theta _r^2}{\sum _{k=1}^{R}\theta _{k}^2}. \end{equation}

The fourth layer is an adaptive-parameter layer. The output of this layer is the product of the normalized firing strength $\theta _r^3$ and $y_r(\mathbf{x})$ :

(5)

\begin{equation} \theta _r^4 = \theta _r^3y_r(\mathbf{x})=\theta _r^3\left( b_{r,0}+\sum _{m=1}^{M}b_{r,m}x_m\right). \end{equation}

The output of the last layer is the sum of all the input signals:

(6)

\begin{equation} \theta _r^5 = \sum _{r=1}^{R}\theta _r^4. \end{equation}

2.2. Adapt the TSK fuzzy system for multiple continuous outputs and reinforcement learning

As mentioned in the introduction, a large action space of the robot is required in real-world odor source searching scenarios. The FIS is expected to have multiple outputs and generate continuous control commands. A larger challenge is that there is usually little expert knowledge that can be utilized to tune the FIS. In this case, the robot is expected to conduct the “trial-and-error” process and dynamically tune the TSK fuzzy system.

In order to provide a solution for the above issues in this paper, the generic structure of the TSK fuzzy system is adapted as “MCOTSK fuzzy system” in this paper. The first four layers of MCOTSK are the same as the generic structure. The fifth layer is consist of $P$ nodes, of which the outputs can be expressed as:

(7)

\begin{equation} \theta _p^5 = \boldsymbol{\Theta }_4^{T} \boldsymbol{\Omega }_p, \end{equation}

where $\boldsymbol{\Theta }_4 = \left (\theta _1^4, \ldots, \theta _R^4\right )^{T}$ , and $\boldsymbol{\Omega }_p= \left (\omega _{p,1}, \ldots, \omega _{p,R}\right )^{T}$ , $(p = 1,\ldots,P)$ . $P$ is the total number of the outputs. The structure of MCOTSK and the scheme of tuning the MCOTSK-based controller using a typical reinforcement learning algorithm, Deep Deterministic Policy Gradient (DDPG), are presented in Fig. 2.

Figure 2. The scheme of tuning the MCOTSK-based controller with the DDPG algorithm. At each step, the olfactory robot perceives the state of the environment (e.g., wind direction and odor concentration) with its sensors. The measured state is fed into the MCOTSK Actor model, and the desired action is determined. The state, the action, and the reward of the action are used to tune the Actor model and the Critic model.

In the framework of the DDPG algorithm, an “Actor” is required to map the state $s$ of the environment to the action $a$ of the robot. In the odor plume tracking task, the state of the environment can be the measured wind direction, wind velocity, odor concentration, etc. The action means control commands for the robot, which can be the turning angle, the movement length, the movement velocity, etc. In this paper, the MCOTSK serves as the Actor model, the inputs of which are the measured states and the outputs are the parameters of the odor plume tracking controller. In order to optimize the MCOTSK-based Actor, the adaptive parameters $a_{r,m}$ , $b_{r,m}$ , $b_{r,0}$ , $c_{r,m}$ and $\omega _{p,r}$ $(r = 1,\ldots,R;\,m = 1,\ldots,M;\,p = 1,\ldots,P)$ need to be tuned. The DDPG algorithm also involves a “Critic” model as the action-value functions $q(s,a)$ , which calculates the expected cumulative future reward of the current action and state. Except for the Actor model and the Critic model, a “Target actor” model and a “Target critic” model are initialized the same as the Actor model and the Critic model, respectively.

At each time step $t$ during the training process, an action command $a_t$ is calculated from the input state $s_t$ with the proposed MCOTSK, and the robot takes a corresponding movement. After the robot interacts with the environment and takes another observation, an updated state $s_{t+1}$ is obtained and serves as the input of the MCOTSK at the next step.

Meanwhile, $s_{t+1}$ is sent to the Target actor model to calculate the action command $a_{\text{targ},t+1}$ . The reward $r_t$ the robot gets at time step $t$ and the action value $q_{\text{targ}}$ calculated with the Target critic model are used to calculate the target action value $r+\gamma q_{\text{targ}}(s_{t+1},a_{\text{targ},t+1})$ . The Temporal-Difference error between the action value $q(s_t,a_t)$ and the target action value are used to adjust the Critic model by minimizing the loss $L(\phi, \mathcal{D})$ with stochastic gradient descent:

(8)

\begin{equation} L(\phi, \mathcal{D}) =\underset{\left (s_{t}, a_{t}, r_{t}, s_{t+1}\right ) \sim \mathcal{D}}{\textrm{E}}\left [\left (q\left (s_t, a_t\right )-\left (r+\gamma q_{\text{targ}}\!\left(s_{t+1}, a_{\text{targ}, t+1}\right )\right )\right )^{2}\right ], \end{equation}

where $\phi$ is the parameters of the Critic model, and $\mathcal{D}$ is the replay buffer storing previous experience $\left (s_{t}, a_{t}, r_{t}, s_{t+1}\right )$ . The Actor model is optimized by maximizing the action value $\underset{(s,a)\sim \mathcal{D}}{\textrm{E}}[q(s,a)]$ .

The parameters $\phi _{\text{targ}}$ of the Target actor and critic models are updated through a soft updating policy at each training step to make the training process more stable:

(9)

\begin{equation} \phi _{\text{targ}} \leftarrow \rho \phi _{\text{targ}}+(1-\rho ) \phi \end{equation}

where $\rho$ is set as $0.9$ in this paper.

The DropRule technique [Reference Wu, Yuan, Huang and Tan27] is applied in the training process of the MCOFIS-based actor model to reduce overfitting and increase generalization. DropRule randomly drops some fuzzy rules during the training process; that is, at each iteration of training, the firing strength of a fuzzy rule is set to zero with probability $P\in (0,1)$ and remains unchanged with probability $1-P$ . By randomly discarding some fuzzy rules, each rule is forced to work robustly with a randomly remaining subset of rules, and in this way, each rule maximizes its own modeling capability, instead of relying on other rules. Besides, LN is used to normalize the firing strength of the rules. Similar to the LN layer in Transformer, the LN layer added in the MCOTSK model can solve the gradient vanishing problems and improve the performance [Reference Cui44].

3. Application in odor plume tracking

Since it is hard and time-consuming to generate variable dynamic odor plumes with controllable parameters to train the models in the real world, the proposed models are trained in simulated environments in this paper.

In this section, the filament-based dynamic plume model is introduced and utilized to generate random odor plume tracking tasks in this paper. An MCOTSK-based Lévy Taxis plume tracking controller is designed. Two reward settings of the plume tracking process are designed and used in the training process of the MCOTSK model.

3.1. Filament-based dynamic odor plume model

In ref. [Reference Farrell, Murlis, Long, Li and Cardé45], a filament-based odor plume model is presented to simulate plumes dispersed in dynamic changing airflow. The modeled odor concentration distribution is intermittent, and the spatial gradient rapidly changes. This model resembles plumes in real-world outdoor scenarios well. The plumes are modeled as plenty of filaments released from the airflow and dispersed by the airflow (illustrated as the red puffs in Fig. 3). In this paper, this model is used to build a simulated environment, in which the dimension of the searching area is $40$ m $\times \,10\,$ m, and the coordinate system is presented in Fig. 3. The position of the odor releasing source is $(5\,\text{m},\,0\,\text{m})$ . The wind velocity is set as $1\,$ m/s. The wind direction is aligned to X-axis at $t=0$ and changes at each time step. The noise gain on the wind direction is set to be $5$ to simulate dynamic airflow.

Figure 3. Illustration of the wind field and the odor plumes simulated by the filament-based model. The star represents the odor source. The yellow round patch covers the area within 2 m from the odor source. The black arrows represent the changing wind field. The dynamic odor plumes are represented by the red puffs. The robot starts within the blue rectangle in each simulated odor plume tracking trial.

The concentration at location $\mathbf{p}$ contributed by the $i$ -th filament is modeled as:

(10)

\begin{equation} C_{i}(\mathbf{p}, t) =\frac{Q}{\sqrt{8 \pi ^{3}} R_{i}^{3}(t)} \exp\! \left (\frac{-r_{i}^{2}(t)}{R_{i}^{2}(t)}\right ) \frac{\text{ molecules }}{\textrm{cm}^{3} \text{ filament }} \end{equation}

(11)

\begin{equation} r_{i}(t) =\left \|\mathbf{p}-\mathbf{p}_{i}(t)\right \|\! \text{cm} \qquad\qquad\qquad\quad\qquad\end{equation}

(12)

\begin{equation} R_i(t) =\left (R_i^{\frac{2}{3}}(0)+\zeta t\right )^{\frac{3}{2}} \qquad\qquad\quad\qquad\qquad\end{equation}

where $Q$ is the filament release rate, $\mathbf{p}_{i}(t)$ is the spatial extent of the $i$ -th filament at time step $t$ , $R_i(t)$ is the dispersion radius of the filament, and $\zeta$ is the growth rate of the filaments.

3.2. MCOTSK-based Lévy Taxis plume tracking controller

The plume tracking algorithm in this paper is a modified version of Lévy Taxis, which was originally a random walk-based plume finding method proposed by Pasternak et al [Reference Pasternak, Bartumeus and Grasso46]. The Lévy Taxis algorithm was modified as Adaptive Lévy Taxis [Reference Emery, Rahbar, Marjovi and Martinoli47] and Fuzzy Lévy Taxis [Reference Chen and Huang9] to work as plume tracking algorithms. With the Lévy Taxis plume tracking algorithm, as soon as the robot starts its odor plume tracking task from a random position in the searching area, it conducts random walk behaviors: at each step, the robot turns its heading $\theta _{\text{a}}$ to the angle $T_a$ and moves forward for a length $M_l$ . $T_a$ and $M_l$ are determined by the distributions presented in Eq. (13) and Eq. (14):

(13)

\begin{equation} T_{a}=\left [2 \cdot \arctan\! \left (\frac{1-\alpha }{1+\alpha } \tan\! (\pi (\text{rnd}-0.5))\right )\right ]+\text{bias} \end{equation}

(14)

\begin{equation} M_{l}=L_{\min } \cdot \text{rnd}^{\dfrac{1}{1-\mu }} \qquad\qquad\qquad\quad\qquad\qquad\qquad\end{equation}

where $\text{rnd}$ follows a uniform distribution $\text{rnd}\sim u(0,1)$ . The variables $\alpha(0\leq \alpha \leq 1)$ and $\mu (1\lt \mu \leq 3)$ are two key parameters adjusting the shapes of the above distributions. $L_{\min}$ is the minimum step length and is $0.5\,\text{m}$ in the training process of MCOTSK. $\text{bias}$ is a function of the upwind angle $\theta _{\text{u}}$ and the robot heading $\theta _{\text{a}}$ , which keeps the center of $T_a$ ’s distribution as a weighted sum of the upwind direction and the current robot heading [formulated as Eq. (15)] to mimic the bio-inspired anemotaxis behaviors. Figure 4 presents an illustration of the wind direction, the upwind angle $\theta _{\text{u}}$ , and the robot heading $\theta _{\text{a}}$ .

(15)

\begin{equation} \text{bias} = \beta \theta _{\text{u}} + (1-\beta )\theta _{\text{a}} \end{equation}

Figure 4. Illustration of the robot heading $\theta _a$ and the upwind angle $\theta _{u}$ .

In order to determine the key parameters $\alpha$ , $\beta$ , and $\mu$ in the Lévy Taxis controller, Adaptive Lévy Taxis [Reference Emery, Rahbar, Marjovi and Martinoli47] formulated the parameters as fixed functions of the concentration gradient $\nabla C = C_{\text{c}} - C_{\text{p}}$ . $C_{\text{c}}$ and $C_{\text{p}}$ are the odor concentration values measured in the current step and the previous step, respectively. In order to enhance the flexibility of the plume tracking controller, ref. [Reference Chen and Huang9] made the parameters as the output of a Mamdani-type fuzzy system, of which the inputs are $\nabla C$ and $C_{\text{c}}$ . The results in [Reference Chen and Huang9] demonstrated that the Lévy Taxis controller based on the fuzzy system can achieve faster odor source localization in various scenarios, but the rules of the fuzzy system and its membership functions are tuned manually, which requires prior expert knowledge on promising plume tracking behaviors.

In this paper, the MCOTSK model was utilized to determine the parameters $\alpha$ , $\beta$ , and $\mu$ of the Lévy Taxis controller. At each iteration, the robot measures the current odor concentration $C_{\text{c}}$ at its location and calculates the concentration gradient $\nabla C$ . The state vector of the environment $\mathbf{s}=\left (C_{\text{c}}, \nabla C\right )^{T}$ serves as the input of the MCOTSK model. The outputs of MCOTSK go through a Tanh activation layer and are rescaled to their proper range. The rescaled outputs are the determined parameters, and the action of the robot including the turning angle $T_a$ and the movement length $M_l$ can be calculated with the Lévy Taxis controller in Eqs. (13) and (14). The robot will keep moving according to the controller until it finds the odor source or reaches the step limit.

3.3. The training process of MCOTSK

In this paper, the MCOTSK is automatically tuned by the DDPG reinforcement learning algorithm. Each trial of odor plume tracking task is a training episode in the DDPG algorithm. A trial will stop when the robot enters the stopping area (represented by the yellow round patch in Fig. 3), hits the boundaries of the simulated area, or the number of searching steps exceeds a limit, which is 60 steps in this paper. At each searching step $t$ , the experience of the robot $\left (s_{t}, a_{t}, r_{t}, s_{t+1}\right )$ is stored in an experience replay buffer $\mathcal{D}$ , of which the size is $5000$ . And a batch of experience (batch size = 32) randomly selected from $\mathcal{D}$ is used to tune the Actor and the Critic at each step. An artificial neural network was used to model the Critic while the proposed MCOTSK was used to model the Actor. The number of rules $R$ is 10, and the DropRule rate $P = 0.2$ . The learning rate of the Actor is $0.001$ and that of the Critic is $0.002$ . The robot gets the reward $r_t$ in time step $t$ :

(16)

\begin{equation} r_t = \begin{cases}{20} & \text{if the robot enters the stopping area,} \\[4pt]{-10} & \text{the robot hits the boundaries of the simulated area,} \\[4pt]{-1+r(C_{\text{c}},\theta _{\text{a}},\theta _{\text{u}})}&\text{otherwise.} \end{cases} \end{equation}

In this paper, two reward settings are used to train the models, respectively. In the first setting, the $r(C_{\text{c}},\theta _{\text{a}},\theta _{\text{u}})$ term is formulated as Eq. (17), where $C_0$ is a constant and set to be 30 in this paper. Since it is designed to let the robot learn the bio-inspired anemotaxis and chemotaxis behaviors, this setting is called the behavior-oriented reward setting in the rest of the paper. In the other reward setting, the $r(C_{\text{c}},\theta _{\text{a}},\theta _{\text{u}})$ term is set to be a constant 0. The robot will learn to reach the odor source with as fewer steps as possible; therefore, this setting is called the result-oriented reward setting.

(17)

\begin{equation} r(C_{\text{c}},\theta _{\text{a}},\theta _{\text{u}}) = \frac{C_{\text{c}}}{C_0}\cos\! (\theta _{\text{u}}-\theta _{\text{a}}). \end{equation}

The DDPG reinforcement learning algorithm was implemented with PyTorch and run on a computer with an AMD Ryzen 5 2600 six-core processor, an 8 GB memory chip (DDR3 SDRAM), and a GeForce GTX 1050 Ti graphics card. The randomly changing wind field and the filament-based odor plumes were used to train the models. The source code can be found at https://github.com/cxxacxx/MCOTSK. The models were trained for 1000 episodes. During the process of training, we recorded the reward the robot obtained in each episode. Figure. 5(a) and (b) present the average reward in every 20 episodes during the training using the above two reward settings, respectively. It can be seen that in the training process with both the reward settings, the average reward started from around −50. And the average reward curves converge to around 5 and −10, respectively, after around 400 episodes. From the increasing average rewards, it can be seen that the robot can learn to track the plumes in dynamic airflow and reaches the odor source with the proposed MCOTSK model and the reinforcement learning algorithm.

Figure 5. The average reward in each episode in the training process with the two reward settings: (a) the behavior-oriented reward setting; (b) the result-oriented reward setting.

4. Performance evaluation in simulation

In this section, the MCOTSK-based plume tracking controllers trained with two different reward settings are compared with the Fuzzy Lévy Taxis method, which was designed with expert knowledge and proven to be adaptive in various environment settings in ref. [Reference Chen and Huang9]. The test settings are presented, and the results are discussed.

4.1. Simulation settings

To investigate the influence of the reward settings on the MCOTSK-based plume tracking controllers and compare the proposed algorithm with the Fuzzy Lévy Taxis method, plume tracking tests were conducted in a simulated testing environment that is different from the training environment. The robot starts from random positions in the rectangle area shown in Fig. 3 and tracks the odor plumes with the three controllers respectively: (1) the MCOTSK-based controller trained with the behavior-oriented reward setting (MCOTSK-BOR), (2) the MCOTSK-based controller trained with the result-oriented reward setting (MCOTSK-ROR), and (3) Fuzzy Lévy Taxis. For each controller, 200 trials are conducted.

4.2. Evaluation metrics

Three metrics are utilized to evaluate the controllers. The first is the success rate: the proportion of trials in which the robot enters the stopping area near the odor source. The second metric is the number of tracking steps in all successful trials. The third one is the distance overhead, which is the traveled distance from the starting position to the stopping position divided by the straight distance in the successful trails. The latter two metrics reflect the odor source searching efficiency.

4.3. Simulation results and discussions

The results of the Monte Carlo tests were shown in Fig. 6. It can be seen that the success rate of the three controllers is very similar, which means that they all showed enough capability for plume tracking in odor plume tracking. In terms of efficiency of the searching process, Fig. 6(b) and (c) showed that the MCOTSK trained with the behavior-oriented reward setting can achieve lower distance overhead and the number of steps is also lower than the other two controllers. This result can be explained that the robot has learned from the elaborately designed reward setting and conducted bio-inspired and well-tuned plume tracking behaviors. However, the design of the behavior-oriented reward setting still required some expert knowledge. The MCOTSK trained with the result-oriented reward setting required little expert knowledge, but it still showed better results compared with the benchmark method in terms of efficiency. Figure 7 presented a typical plume tracking trajectory of the robot with each controller. It can be seen that the MCOTSK can achieve an obviously more straightforward tracking trajectory, which was almost aligned to the wind direction. And with no surprise, the Fuzzy Lévy Taxis generated a more meandering trajectory than the trained controllers.

Figure 6. Simulation results for evaluating the performance of the MCOTSK – BOR controller, the MCOTSK-ROR controller, and the Fuzzy Lévy Taxis controller.

Figure 7. Typical odor source searching trajectories with the three controllers. The blue curves represent the trajectories.

From the results, it can be demonstrated that tuning the MCOTSK model with the DDPG reinforcement learning algorithm is feasible and the trained controllers can achieve even better results than the manually tuned fuzzy controller, which requires expert knowledge. The reward settings indeed can affect the performance of the trained controller, which can provide some inspiration for the following work to design the reward settings elaborately to let the robot learn expected behaviors.

5. Experiments

In this section, the MCOTSK-based plume tracking controller trained with the behavior-oriented reward is validated through robotic experiments. The adaptation of the controller from the simulated environment to the real environment is introduced. The experiment results are presented and discussed.

5.1. Experiment setup

The experiments were conducted in a laboratory at Huazhong University of Science and University, of which the size was $3.04$ m $\times \,3.75\,$ m (shown in Fig. 8(a)). A smoke machine was placed in the laboratory to generate smoke plumes. Two electric fans were utilized to generate indoor turbulent airflow to disperse the plumes.

Figure 8. The setup for the smoke plume tracking experiments.

The olfactory robot described in ref. [Reference Chen and Huang5] was employed to conduct the plume tracking tasks in this paper. The robot (shown in Fig. 8(b)) was remolded from Turtlebot 3. A Gill WindSonic sensor and a Plantower PMS7003 sensor were mounted on the robot to measure the wind direction and the particulate matter concentration, respectively. The sampling rates of the wind sensor and the particulate matter sensor are 4 Hz and 1 Hz, respectively. Because the proposed controller works in a discrete-time manner and takes observations at the beginning of each searching step before moving the robot to a new position, the sampling rates only affect the duration of each observation stage, but do not affect the performance (e.g., the success rate, the distance overhead) of the proposed controller. A Raspberry Pi was mounted on the robot to communicate with the remote PC through User Datagram Protocol (UDP) unicast and send movement commands to the OpenCR through the serial. The wheels of the robot were actuated by the OpenCR board, which executed movement commands received from the Raspberry Pi. The real-time position of the robot during experiments was captured by a camera mounted on the ceiling of the laboratory by recognizing the red and green LED markers on the top of the robot with SwisTrack [Reference Lochmatter, Roduit, Cianci, Correll, Jacot and Martinoli48]. The captured position of the robot was used to record the ground truth trajectories during plume tracking.

5.2. Sim to real adaptations

Since the model is trained in simulated environments, but needs to be deployed on a real robot, some adaptations to the plume tracking algorithm are required.

The first one is to rescale the measured particulate matter concentration to a suitable range of the inputs for the MCOTSK. The measured number of particles with a diameter beyond 0.3 μm (PM0.3) in 0.1 L of air around the robot (denoted by $n_{0.3}$ in this paper) varies from around 2000 to more than 30,000 during the plume tracking process. But the range of the input $C_{\text{c}}$ in the simulated environment is from 0 to around 30. Therefore, in the robotic experiments, the input $C_{\text{c}}$ for the MCOTSK model is calculated by:

(18)

\begin{equation} C_c = \begin{cases}{\log _2\! (n_{0.3}-n_{\text{baseline}})} & \text{if } n_{0.3}\gt n_{\text{baseline}}, \\[4pt]{0} & \text{otherwise.} \end{cases} \end{equation}

where $n_{\text{baseline}}$ is the number of PM0.3 measured in clean air and is set as 1888 in this paper.

The second adaptation is that the minimum movement length $L_{\text{min}}$ at each step is set to be $0.05$ m, and the maximum of the movement length is set to be $0.2$ m. The moving speed of the robot is 0.079 m/s. This setting can ensure the safety of the robot in the small searching area and prevent it from getting burned by the smoke machine. Besides, the radius of the stopping area in the experiment is set to be 0.5 m.

5.3. Plume tracking experiments and results

In order to validate that the relative position of the smoke leakage source and the robot does not affect the performance of the proposed controller, the smoke machine was placed at three different positions in the experiments (see Table. I), and for each source position, the robot started from three different positions to track the smoke plumes and searching for the smoke leakage source.

Table I. The smoke plume tracking experiment settings and the results.

The results of 9 plume tracking experiments are presented in Table I. Figure 9 shows the robot’s trajectories during nine experiments. The mean and medium distance overhead are 1.1385 and 1.0544, respectively, which are close to 1 and match the simulation results well. The results and the trajectories can demonstrate that the proposed MCOTSK-based controller makes the robot track the smoke plumes and find the smoke source with an almost straight path. The relative position of the smoke leakage source and the robot is not related to the performance of the proposed controller.

Figure 9. The robot’s trajectories during 9 plume tracking experiments (the trajectories are represented by the blue lines, the location of the smoke leakage source is marked by the red dots, and the stopping area is represented by the yellow patch).

Videos recorded during experiments were attached to the manuscript.

6. Conclusions

In this paper, a multi-continuous-output TSK fuzzy system was designed and tuned with reinforcement learning. The structure of the fuzzy system and the training process was optimized with advanced techniques in machine learning, including DropRule and LN. The trained fuzzy system-based plume tracking controllers can achieve around 85% success rate, which is similar to a manually tuned benchmark method, and higher odor source searching efficiency. The results also showed that a well-designed reward setting in the training process can further improve the performance of the controller. The controller was validated through experiments on a real robot, and the experiment results matched the simulation results well.

In our future work, the influence of the optimization techniques on the reinforcement learning TSK fuzzy system will be analyzed. To achieve a robust performance of the fuzzy system-based controller, more rigorous mathematical reasoning and stability analysis are also required [Reference Li, Zhao, Zhang, Wu, Zhang, Li, Li and Su49–Reference Li, Li and Kan52].

Supplementary materials

To view supplementary material for this article, please visit https://doi.org/10.1017/S0263574722001321.

Author contributions

X C and B Y contributed to the conception and implementation of the study. J H contributed to providing the experiment devices and facilities. Y L and C F contributed to supervising the study, reviewing and revising the manuscript.

Financial support

This work was supported by the National Natural Science Foundation of China [Grant U1913205, 62103180, and 52175272]; Guangdong Innovative and Entrepreneurial Research Team Program [Grant 2016ZT06G587]; the China Postdoctoral Science Foundation (2021M701577); the Science, Technology and Innovation Commission of Shenzhen Municipality [ZDSYS20200811143601004 and KYTDPT20181011104007]; the Stable Support Plan Program of Shenzhen Natural Science Fund [Grant 20200925174640002]; and Centers for Mechanical Engineering Research and Education at MIT and SUSTech.

Conflicts of interest

The authors declare no conflicts of interest.

References

Chen, X. and Huang, J., “Odor source localization algorithms on mobile robots: A review and future outlook,” Robot. Auton. Syst. 112(1), 123–136 (2019).CrossRef Google Scholar

Li, Z., Su, C. Y., Wang, L., Chen, Z. and Chai, T., “Nonlinear disturbance observer-based control design for a robotic exoskeleton incorporating fuzzy approximation,” IEEE Trans. Ind. Electron. 62(9), 5763–5775 (2015).CrossRef Google Scholar

Ma, D., Mao, W., Tan, W., Gao, J., Zhang, Z. and Xie, Y., “Emission source tracing based on bionic algorithm mobile sensors with artificial olfactory system,” Robotica 40(4), 976–996 (2022).CrossRef Google Scholar

Larsch, J., Flavell, S. W., Liu, Q., Gordus, A., Albrecht, D. R. and Bargmann, C. I., “A circuit for gradient climbing in C. elegans chemotaxis,” Cell Rep. 12(11), 1748–1760 (2015).CrossRef Google Scholar PubMed

Chen, X. and Huang, J., “Combining particle filter algorithm with bio-inspired anemotaxis behavior: A smoke plume tracking method and its robotic experiment validation,” Measurement 154, 107482 (2020).CrossRef Google Scholar

Vergassola, M., Villermaux, E. and Shraiman, B. I., “infotaxis’ as a strategy for searching without gradients,” Nature 445(7126), 406–409 (2007).CrossRef Google Scholar PubMed

Chen, X., Marjovi, A., Huang, J. and Martinoli, A., “Particle source localization with a low-cost robotic sensor system: Algorithmic design and performance evaluation,” IEEE Sens. J. 20(21), 13074–13085 (2020).CrossRef Google Scholar

Arya, S. P., Air Pollution Meteorology and Dispersion, vol. 310 (Oxford University Press, New York, 1999).Google Scholar

Chen, X. and Huang, J., “Towards Environmentally Adaptive Odor Source Localization: Fuzzy Lévy Taxis Algorithm and Its Validation in Dynamic Odor Plumes,” In: 2020 5th International Conference on Advanced Robotics and Mechatronics (ICARM) (2020) pp. 282–287.Google Scholar

Wang, L. and Pang, S., “An Implementation of the Adaptive Neuro-Fuzzy Inference System (ANFIS) for Odor Source Localization,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020) pp. 4551–4558.Google Scholar

Wang, L., Pang, S. and Li, J., “Olfactory-based navigation via model-based reinforcement learning and fuzzy inference methods,” IEEE Trans. Fuzzy Syst. 29(10), 3014–3027 (2021).CrossRef Google Scholar

Wang, L., Pang, S. and Li, J., “Learn to Trace Odors: Autonomous Odor Source Localization via Deep Learning Methods,” In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) (IEEE, 2021) pp. 1429–1436.CrossRef Google Scholar

Chen, X., Fu, C. and Huang, J., “A Deep Q-Network for robotic odor/gas source localization: Modeling, measurement and comparative study,” Measurement 183, 109725 (2021).CrossRef Google Scholar

Hu, H., Song, S. and Chen, C. P., “Plume tracing via model-free reinforcement learning method,” IEEE Trans. Neural Netw. Learn. Syst. 30(8), 2515–2527 (2019).CrossRef Google Scholar PubMed

Chen, C., Zhang, K., Leng, Y., Chen, X. and Fu, C., “Unsupervised sim-to-real adaptation for environmental recognition in assistive walking,” IEEE Trans. Neural Syst. Rehabil. Eng. 30, 1350–1360 (2022).CrossRef Google Scholar PubMed

Cao, Y. and Huang, J., “Neural-network-based nonlinear model predictive tracking control of a pneumatic muscle actuator-driven exoskeleton,” IEEE/CAA J. Autom. Sin. 7(6), 1478–1488 (2020).CrossRef Google Scholar

Su, H., Hu, Y., Karimi, H. R., Knoll, A., Ferrigno, G. and De Momi, E., “Improved recurrent neural network-based manipulator control with remote center of motion constraints: Experimental results,” Neural Netw. 131(12), 291–299 (2020).CrossRef Google Scholar PubMed

Fang, B., Long, X., Sun, F., Liu, H., Zhang, S. and Fang, C., “Tactile-based fabric defect detection using convolutional neural network with attention mechanism,” IEEE Trans. Instrum. Meas. 71, 1–9 (2022).Google Scholar

Li, Z., Cao, X. and Ding, N., “Adaptive fuzzy control for synchronization of nonlinear teleoperators with stochastic time-varying communication delays,” IEEE Trans. Fuzzy Syst. 19(4), 745–757 (2011).CrossRef Google Scholar

Yang, C., Jiang, Y., Na, J., Li, Z., Cheng, L. and Su, C.-Y., “Finite-time convergence adaptive fuzzy control for dual-arm robot with unknown kinematics and dynamics,” IEEE Trans. Fuzzy Syst. 27(3), 574–588 (2018).CrossRef Google Scholar

Chen, X., Leng, Y. and Fu, C., “A supervised-reinforced successive training framework for a fuzzy inference system and its application in robotic odor source searching,” Front Neurorobot. 16, 5962 (2022).CrossRef Google Scholar PubMed

Salehi, M., Pishkenari, H. N. and Zohoor, H., “Position control of a wheel-based miniature magnetic robot using neuro-fuzzy network,” Robotica, 1–16 (2022).Google Scholar

Li, Z., Ren, Z., Zhao, K., Deng, C. and Feng, Y., “Human-cooperative control design of a walking exoskeleton for body weight support,” IEEE Trans. Ind. Inform. 16(5), 2985–2996 (2019).CrossRef Google Scholar

Su, H., Qi, W., Chen, J. and Zhang, D., “Fuzzy approximation-based task-space control of robot manipulators with remote center of motion constraint,” IEEE Trans. Fuzzy Syst. 30(6), 1564–1573 (2022).CrossRef Google Scholar

Li, F., Zhang, Z., Wu, Y., Chen, Y., Liu, K. and Yao, J., “Improved fuzzy sliding mode control in flexible manipulator actuated by pmas,” Robotica 40(8), 1–14 (2022).CrossRef Google Scholar

Veysi, M., Soltanpour, M. R. and Khooban, M. H., “A novel self-adaptive modified bat fuzzy sliding mode control of robot manipulator in presence of uncertainties in task space,” Robotica 33(10), 2045–2064 (2015).CrossRef Google Scholar

Wu, D., Yuan, Y., Huang, J. and Tan, Y., “Optimize TSK fuzzy systems for regression problems: Minibatch gradient descent with regularization, DropRule and AdaBound (MBGD-RDA),” IEEE Trans. Fuzzy Syst. 28(5), 1003–1015 (2019).CrossRef Google Scholar

Su, H., Qi, W., Schmirander, Y., Ovur, S. E., Cai, S. and Xiong, X., “A human activity-aware shared control solution for medical human–robot interaction,” Assem. Autom. 42(3), 388–394 (2022).CrossRef Google Scholar

Yang, B., Huang, J., Chen, X., Xiong, C. and Hasegawa, Y., “Supernumerary robotic limbs: A review and future outlook,” IEEE Trans. Med. Robot. Bionics 3(3), 623–639 (2021).CrossRef Google Scholar

Chen, X., Zhang, K., Liu, H., Leng, Y. and Fu, C., “A probability distribution model-based approach for foot placement prediction in the early swing phase with a wearable imu sensor,” IEEE Trans. Neural Syst. Rehabil. Eng. 29, 2595–2604 (2021).CrossRef Google Scholar PubMed

Fang, B., Ding, W., Sun, F., Shan, J., Wang, X., Wang, C. and Zhang, X., “Brain-computer interface integrated with augmented reality for human-robot interaction,” IEEE Trans. Cogn. Dev. Syst., 1–1 (2022).Google Scholar

Zhang, K., Luo, J., Xiao, W., Zhang, W., Liu, H., Zhu, J., Lu, Z., Rong, Y., de Silva, C. W. and Fu, C., “A subvision system for enhancing the environmental adaptability of the powered transfemoral prosthesis,” IEEE Trans. Cybern. 51(6), 3285–3297 (2021).CrossRef Google Scholar PubMed

Chen, X., Chen, C., Wang, Y., Yang, B., Ma, T., Leng, Y. and Fu, C., “A piecewise monotonic gait phase estimation model for controlling a powered transfemoral prosthesis in various locomotion modes,” IEEE Robot. Autom. Lett. 7(4), 9549–9556 (2022).CrossRef Google Scholar

Guo, Y., Song, B., Tang, X., Zhou, X. and Jiang, Z., “A calibration method of non-contact r-test for error measurement of industrial robots,” Measurement 173, 108365 (2021).CrossRef Google Scholar

Guo, Y., Song, B., Tang, X., Zhou, X. and Jiang, Z., “A measurement method for calibrating kinematic parameters of industrial robots with point constraint by a laser displacement sensor,” Meas. Sci. Technol. 31(7), 075004 (2020).CrossRef Google Scholar

Guo, Y., Tang, X., Zhou, X., Song, B., Jiang, Z., Xie, Y. and Ye, B., “Continuous measurements with single setup for position-dependent geometric errors of rotary axes on five-axis machine tools by a laser displacement sensor,” Int. J. Adv. Manuf. Technol. 99(5), 1589–1602 (2018).CrossRef Google Scholar

Cao, Y., Huang, J., Xiong, C.-H., Wu, D., Zhang, M., Li, Z. and Hasegawa, Y., “Adaptive proxy-based robust control integrated with nonlinear disturbance observer for pneumatic muscle actuators,” IEEE/ASME Trans. Mechatron. 25(4), 1756–1764 (2020).CrossRef Google Scholar

Huang, J., Guan, Z.-H., Matsuno, T., Fukuda, T. and Sekiyama, K., “Sliding-mode velocity control of mobile-wheeled inverted-pendulum systems,” IEEE Trans. Robot. 26(4), 750–758 (2010).CrossRef Google Scholar

Zhang, F., Xia, R. and Chen, X., “An optimal trajectory planning algorithm for autonomous trucks: Architecture, algorithm, and experiment,” Int. J. Adv. Robot. Syst. 17(2), 1–12 (2020).CrossRef Google Scholar

Fang, B., Sun, F., Wu, L., Liu, F., Wang, X., Huang, H., Huang, W., Liu, H. and Wen, L., “Multimode grasping soft gripper achieved by layer jamming structure and tendon-driven mechanism,” Soft Robot. 9(2), 233–249 (2022).CrossRef Google Scholar PubMed

Nguyen, A.-T., Taniguchi, T., Eciolaza, L., Campos, V., Palhares, R. and Sugeno, M., “Fuzzy control systems: Past, present and future,” IEEE Comput. Intell. Mag. 14(1), 56–68 (2019).CrossRef Google Scholar

Wu, D. and Tan, W. W., “Genetic learning and performance evaluation of interval type-2 fuzzy logic controllers,” Eng. Appl. Artif. Intell. 19(8), 829–841 (2006).CrossRef Google Scholar

Wang, L.-X. and Mendel, J. M., “Back-Propagation Fuzzy System as Nonlinear Dynamic System Identifiers,” In: [1992 Proceedings] IEEE International Conference on Fuzzy Systems (IEEE, 1992) pp. 1409–1418.Google Scholar

Cui, Y., “PyTSK,” (2022). https://github.com/YuqiCui/PyTSK Google Scholar

Farrell, J. A., Murlis, J., Long, X., Li, W. and Cardé, R. T., “Filament-based atmospheric dispersion model to achieve short time-scale structure of odor plumes,” Environ. Fluid Mech. 2(1-2), 143–169 (2002).CrossRef Google Scholar

Pasternak, Z., Bartumeus, F. and Grasso, F. W., “Lévy-taxis: A novel search strategy for finding odor plumes in turbulent flow-dominated environments,” J. Phys. A Math. Theor. 42(43), 434010 (2009).CrossRef Google Scholar

Emery, R., Rahbar, F., Marjovi, A. and Martinoli, A., “Adaptive lévy Taxis for Odor Source Localization in Realistic Environmental Conditions,” In: 2017 IEEE International Conference on Robotics and Automation (ICRA) (2017) pp. 3552–3559.Google Scholar

Lochmatter, T., Roduit, P., Cianci, C., Correll, N., Jacot, J. and Martinoli, A., “Swistrack-A Flexible Open Source Tracking Software for Multi-Agent Systems,” In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2008) pp. 4004–4010.CrossRef Google Scholar

Li, Z., Zhao, K., Zhang, L., Wu, X., Zhang, T., Li, Q., Li, X. and Su, C-Y., “Human-in-the-loop control of a wearable lower limb exoskeleton for stable dynamic walking,” IEEE/ASME Trans. Mechatron. 26(5), 2700–2711 (2020).CrossRef Google Scholar

Li, Z., Deng, C. and Zhao, K., “Human-cooperative control of a wearable walking exoskeleton for enhancing climbing stair activities,” IEEE Trans. Ind. Electron. 67(4), 3086–3095 (2019).CrossRef Google Scholar

Wu, X. and Li, Z., “Cooperative manipulation of wearable dual-arm exoskeletons using force communication between partners,” IEEE Trans. Ind. Electron. 67(8), 6629–6638 (2019).CrossRef Google Scholar

Li, G., Li, Z. and Kan, Z., “Assimilation control of a robotic exoskeleton for physical human-robot interaction,” IEEE Robot. Autom. Lett. 7(2), 2977–2984 (2022).CrossRef Google Scholar

Figure 4. Illustration of the robot heading $\theta _a$ and the upwind angle $\theta _{u}$.

Figure 5. The average reward in each episode in the training process with the two reward settings: (a) the behavior-oriented reward setting; (b) the result-oriented reward setting.

Figure 6. Simulation results for evaluating the performance of the MCOTSK – BOR controller, the MCOTSK-ROR controller, and the Fuzzy Lévy Taxis controller.

Figure 7. Typical odor source searching trajectories with the three controllers. The blue curves represent the trajectories.

Figure 8. The setup for the smoke plume tracking experiments.

Table I. The smoke plume tracking experiment settings and the results.

Chen et al. supplementary material

Chen et al. supplementary material 1

File 60 Bytes

Chen et al. supplementary material

Chen et al. supplementary material 2

Video 36 MB

Article contents

A reinforcement learning fuzzy system for continuous control in robotic odor plume tracking

Abstract

Keywords

1. Introduction

2. Methods

2.1. General TSK fuzzy system structure

2.2. Adapt the TSK fuzzy system for multiple continuous outputs and reinforcement learning

3. Application in odor plume tracking

3.1. Filament-based dynamic odor plume model

3.2. MCOTSK-based Lévy Taxis plume tracking controller

3.3. The training process of MCOTSK

4. Performance evaluation in simulation

4.1. Simulation settings

4.2. Evaluation metrics

4.3. Simulation results and discussions

5. Experiments

5.1. Experiment setup

5.2. Sim to real adaptations

5.3. Plume tracking experiments and results

6. Conclusions

Supplementary materials

Author contributions

Financial support

Conflicts of interest

References

Chen et al. supplementary material

Chen et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests