Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-13T20:55:14.992Z Has data issue: false hasContentIssue false

Improving RGB-D SLAM in dynamic environments using semantic aided segmentation

Published online by Cambridge University Press:  16 November 2021

Lhilo Kenye*
Affiliation:
Centre of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, Prayagraj, India NavAjna Technologies Pvt. Ltd., Hyderabad, India
Rahul Kala
Affiliation:
Centre of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, Prayagraj, India
*
*Corresponding author. E-mail: lkenye02@gmail.com

Summary

Most conventional simultaneous localization and mapping (SLAM) approaches assume the working environment to be static. In a highly dynamic environment, this assumption divulges the impediments of a SLAM algorithm that lack modules that distinctively attend to dynamic objects despite the inclusion of optimization techniques. This work exploits such environments and reduces the effects of dynamic objects in a SLAM algorithm by separating features belonging to dynamic objects and static background using a generated binary mask image. While the features belonging to the static region are used for performing SLAM, the features belonging to non-static segments are reused instead of being eliminated. The approach employs deep neural network or DNN-based object detection module to obtain bounding boxes and then generates a lower resolution binary mask image using depth-first search algorithm over the detected semantics, characterizing the segmentation of the foreground from the static background. In addition, the features belonging to dynamic objects are tracked into consecutive frames to obtain better masking consistency. The proposed approach is tested on both publicly available dataset as well as self-collected dataset, which includes both indoor and outdoor environments. The experimental results show that the removal of features belonging to dynamic objects for a SLAM algorithm can significantly improve the overall output in a dynamic scene.

Type
Research Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Cadena, C., Carlone, L., Carrillo, H., et al.Past, present, and future of simultaneous localization and mapping: toward the robust-perception age,” IEEE Trans. Rob. 32(6), 13091332 (2016).CrossRefGoogle Scholar
Sturm, J., Engelhard, N., Endres, F, Burgard, W. and Cremers, D., “A benchmark for the evaluation of RGB-D SLAM systems,” Paper Presented at: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems 7–12 Oct. 2012 (2012).CrossRefGoogle Scholar
Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “RGB-D SLAM in dyna mic environments using point correlations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3010942.Google Scholar
Strasdat, H., Montiel, J. M. M. and Davison, A. J., “Visual SLAM: Why filter?,” Image Vis. Comput. 30(2), 6577 (2012).CrossRefGoogle Scholar
Engel, J., Schöps, T. and Cremers, D., “LSD-SLAM: Large-Scale Direct Monocular SLAM,” Paper presented at: Computer Vision – ECCV (2014).CrossRefGoogle Scholar
Engel, J., Stückler, J. and Cremers, D., “Large-scale direct SLAM with stereo cameras,” Paper presented at: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 28 Sept.–2 October 2015 (2015).CrossRefGoogle Scholar
Mur-Artal, R., Montiel, J. M. M. and Tardós, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Rob. 31(5), 11471163 (2015).CrossRefGoogle Scholar
Mur-Artal, R. and Tardós, J. D.ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Rob. 33(5), 12551262 (2017).CrossRefGoogle Scholar
Pire, T., Fischer, T., Civera, J., Cristóforis, P. D. and Berlles, J. J., “Stereo parallel tracking and mapping for robot localization,” Paper presented at: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 28 Sept.–2 October 2015 (2015).CrossRefGoogle Scholar
Lowe, D. G., “Object recognition from local scale-invariant features,” Paper presented at: Proceedings of the Seventh IEEE International Conference on Computer Vision; 20–27 Sept. 1999 (1999).CrossRefGoogle Scholar
Bay, H., Ess, A., Tuytelaars, T. and Van Gool, L., “Speeded-up robust features (SURF),” Comput. Vis. Image Underst. 110(3), 346359 (2008).CrossRefGoogle Scholar
Calonder, M., Lepetit, V., Strecha, C. and Fua, P., “BRIEF: Binary robust independent elementary features,” Paper presented at: Computer Vision – ECCV (Berlin, Heidelberg, 2010).Google Scholar
Alcantarilla, P. F., Bartoli, A. and Davison, A. J., “KAZE Features,” Paper presented at: Computer Vision – ECCV (Berlin, Heidelberg, 2012).Google Scholar
Rosten, E. and Drummond, T., “Machine learning for high-speed corner detection,” Paper presented at: Computer Vision – ECCV (Berlin, Heidelberg, 2006).Google Scholar
Rublee, E., Rabaud, V., Konolige, K. and Bradski, G., “ORB: An efficient alternative to SIFT or SURF,” Paper presented at: 2011 International Conference on Computer Vision; 6–13 Nov. 2011 (2011).CrossRefGoogle Scholar
Huletski, A., Kartashov, D. and Krinkin, K., “Evaluation of the modern visual SLAM methods,” Paper presented at: 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT); 9–14 Nov. 2015, (2015).Google Scholar
Churchill, W. and Newman, P., “Practice makes perfect? Managing and leveraging visual experiences for lifelong navigation,” Paper presented at: 2012 IEEE International Conference on Robotics and Automation; 14–18 May 2012 (2012).CrossRefGoogle Scholar
Churchill, W. and Newman, P., “Continually improving large scale long term visual navigation of a vehicle in dynamic urban environments,” Paper presented at: 2012 15th International IEEE Conference on Intelligent Transportation Systems; 16–19 Sept. 2012 (2012).CrossRefGoogle Scholar
Churchill, W. and Newman, P., “Experience-based na vigation for long-term localization,” Int. J. Robot. Res. 32(14), 16451661 (2013).CrossRefGoogle Scholar
Linegar, C., Churchill, W. and Newman, P., “Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localization,” Paper presented at: 2015 IEEE International Conference on Robotics and Automation (ICRA); 26–30 May 2015 (2015).CrossRefGoogle Scholar
Zheng, Shuai, Jayasumana, Sadeep, Romera-Paredes, Bernardino, Vineet, Vibhav, Su, Zhizhong, Du, Dalong, Huang, Chang, Torr, Philip H. S., “Conditional random fields as recurrent neural networks,” Paper presented at: 2015 IEEE International Conference on Computer Vision (ICCV); 7–13 Dec. 2015 (2015).CrossRefGoogle Scholar
Saputra, M. R. U., Markham, A. and Trigoni, N., “Visual SLAM and structure from motion in dynamic environments: A survey,” ACM Comput. Surv. 51(2) Article 37 (2018).Google Scholar
Parra, I., Sotelo, M. A. and Vlacic, L., “Robust visual odometry for complex urban environments,” Paper presented at: 2008 IEEE Intelligent Vehicles Symposium; 4–6 June 2008 (2008).CrossRefGoogle Scholar
Kitt, B., Moosmann, F. and Stiller, C., “Moving on to dynamic environments: Visual odometry using feature classification,” Paper presented at: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems; 18–22 Oct. 2010 (2010).CrossRefGoogle Scholar
Zou, D. and Tan, P., “CoSLAM: Collaborative visual SLAM in dynamic environments,” IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 354366 (2013).CrossRefGoogle ScholarPubMed
Azartash, H., Lee, K. and Nguyen, T. Q., “Visual odometry for RGB-D cameras for dynamic scenes,” Paper presented at: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 4–9 May 2014 (2014).CrossRefGoogle Scholar
Felzenszwalb, P. F. and Huttenlocher, D. P., “Efficient graph-based image segmentation,” Int. J. Comput. Vis. 59(2), 167181 (2004).CrossRefGoogle Scholar
An, L., Zhang, X., Gao, H. and Liu, Y., “Semantic segmentation–aided visual odometry for urban autonomous driving,” Int. J. Adv. Rob. Syst. 14(5) (2017). doi: 10.1177/1729881417735667.Google Scholar
Lee, S., Son, C. Y. and Kim, H. J., “Robust real-time RGB-D visual odometry in dynamic environments via rigid motion model,” Paper presented at: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 3–8 Nov. 2019 (2019).CrossRefGoogle Scholar
Sun, Y., Liu, M. and Meng, M. Q. H., “Improving RGB-D SLAM in dynamic environments: A motion removal approach,” Rob. Auton. Syst. 89, 110122 (2017).CrossRefGoogle Scholar
Sun, Y, Liu, M and Meng, M. Q. H., “Motion removal for reliable RGB-D SLAM in dynamic environments,” Rob. Auton. Syst. 108, 115128 (2018).CrossRefGoogle Scholar
Zhang, Y., Dai, W., Peng, Z., Li, P. and Fang, Z., “Feature regions segmentation based rgb-d visual odometry in dynamic environment,” Paper presented at: IECON 2018 – 44th Annual Conference of the IEEE Industrial Electronics Society; 21–23 Oct. 2018 (2018).CrossRefGoogle Scholar
Barber, C. B., Dobkin, D. P. and Huhdanpaa, H., “The quickhull algorithm for convex hulls,” J ACM Trans. Math. Softw. 22(4), 469483 (1996).CrossRefGoogle Scholar
Scona, R., Jaimez, M., Petillot, Y. R., Fallon, M. and Cremers, D., “StaticFusion: Background reconstruction for dense RGB-D SLAM in dynamic environments,” Paper presented at: 2018 IEEE International Conference on Robotics and Automation (ICRA); 21–25 May 2018 (2018).CrossRefGoogle Scholar
Cheng, J., Sun, Y. and Meng, M. Q. H., “Robust semantic mapping in challenging environments,” Robotica 38(2), 256270 (2020).CrossRefGoogle Scholar
Cheng, J., Wang, C. and Meng, M. Q., “Robust visual localization in dynamic environments based on sparse motion removal,” IEEE Trans. Autom. Sci. Eng. 17(2), 658669 (2020).CrossRefGoogle Scholar
Cheng, J., Zhang, H. and Meng, M. Q., “Improving visual localization accuracy in dynamic environments based on dynamic region removal,” IEEE Trans. Autom. Sci. Eng. 17(3), 15851596 (2020).CrossRefGoogle Scholar
Zhao, Z., Zheng, P., Xu, S. and Wu, X., “Object detection with deep learning: A review,” IEEE Trans. Neural Networks Learn. Syst. 30(11), 32123232 (2019).CrossRefGoogle ScholarPubMed
Lucas, B. D. and Kanade, T., “An iterative image registration technique with an application to stereo vision,” Proceedings of the 7th international joint conference on Artificial intelligence – Volume 2 (Vancouver, BC, Canada, 1981).Google Scholar
Shi, Jianbo and Tomasi, , “Good features to track,” 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1994, pp. 593-600, doi: 10.1109/CVPR.1994.323794.CrossRefGoogle Scholar
Huang, Jonathan, Rathod, Vivek, Sun, Chen, Zhu, Menglong, Korattikara, Anoop, Fathi, Alireza, Fischer, Ian, Wojna, Zbigniew, Song, Yang, Guadarrama, Sergio, Murphy, Kevin, “Speed/accuracy trade-offs for modern convolutional object detectors,” Paper presented at: Proceedings of the IEEE conference on computer vision and pattern recognition 2017 (2017).CrossRefGoogle Scholar
Labbé, M. and Michaud, F., “RTAB-Map as an open-source lida r and visual simultaneous localization and mapping library for large-scale and long-term online operation,” J. Field Robot. 36(2), 416446 (2019).CrossRefGoogle Scholar