Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-13T22:35:47.724Z Has data issue: false hasContentIssue false

Data augmentation by separating identity and emotion representations for emotional gait recognition

Published online by Cambridge University Press:  06 February 2023

Weijie Sheng
Affiliation:
Yangzhou Collaborative Innovation Research Institute Co., Ltd., Institute of Shenyang Aircraft Design and Research, Yangzhou, 225000, China Key Laboratory of Measurement and Control of CSE Ministry of Education, School of Automation, Southeast University, Nanjing, China
Xiaoyan Lu
Affiliation:
School of Cyber Science and Engineering, Southeast University, Nanjing, China
Xinde Li*
Affiliation:
Key Laboratory of Measurement and Control of CSE Ministry of Education, School of Automation, Southeast University, Nanjing, China School of Cyber Science and Engineering, Southeast University, Nanjing, China
*
*Corresponding author. Email: xindeli@seu.edu.cn

Abstract

Human-centered intelligent human–robot interaction can transcend the traditional keyboard and mouse and have the capacity to understand human communicative intentions by actively mining implicit human clues (e.g., identity information and emotional information) to meet individuals’ needs. Gait is a unique biometric feature that can provide reliable information to recognize emotions even when viewed from a distance. However, the insufficient amount and diversity of training data annotated with emotions severely hinder the application of gait emotion recognition. In this paper, we propose an adversarial learning framework for emotional gait dataset augmentation, with which a two-stage model can be trained to generate a number of synthetic emotional samples by separating identity and emotion representations from gait trajectories. To our knowledge, this is the first work to realize the mutual transformation between natural gait and emotional gait. Experimental results reveal that the synthetic gait samples generated by the proposed networks are rich in emotional information. As a result, the emotion classifier trained on the augmented dataset is competitive with state-of-the-art gait emotion recognition works.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Teijeiro-Mosquera, L., Biel, J.-I., Alba-Castro, J. L. and Gatica-Perez, D., “What your face vlogs about: expressions of emotion and big-five traits impressions in youtube,” IEEE Trans. Affect. Comput. 6(2), 193205 (2015).CrossRefGoogle Scholar
Korayem, M., Azargoshasb, S., Korayem, A. and Tabibian, S., “Design and implementation of the voice command recognition and the sound source localization system for human–robot interaction,” Robotica 39(10), 17791790 (2021).CrossRefGoogle Scholar
Liu, N., Zhou, T., Ji, Y., Zhao, Z. and Wan, L., “Synthesizing talking faces from text and audio: an autoencoder and sequence-to-sequence convolutional neural network,” Pattern Recognit. 102, 107231 (2020).Google Scholar
Yun, S.-S., “A gaze control of socially interactive robots in multiple-person interaction,” Robotica 35(11), 21222138 (2017).CrossRefGoogle Scholar
Liu, X., Khan, K. N., Farooq, Q., Hao, Y. and Arshad, M. S., “Obstacle avoidance through gesture recognition: Business advancement potential in robot navigation socio-technology,” Robotica 37(10), 16631676 (2019).CrossRefGoogle Scholar
Xue, P., Li, B., Wang, N. and Zhu, T., “Emotion Recognition From Human Gait Features Based on DCT Transform,” In: 5th International Conference on Human Centered Computing (HCC), vol. 11956 (2019) pp. 511517.Google Scholar
Göngör, F. and Tutsoy, Ö., “Design and implementation of a facial character analysis algorithm for humanoid robots,” Robotica 37(11), 18501866 (2019).CrossRefGoogle Scholar
Jain, R., Semwal, V. B. and Kaushik, P., “Stride segmentation of inertial sensor data using statistical methods for different walking activities,” Robotica, 114 (2021).Google Scholar
Cutting, J. E. and Kozlowski, L. T., “Recognizing friends by their walk: Gait perception without familiarity cues,” Bull. Psychon. Soc. 9(5), 353356 (1977).CrossRefGoogle Scholar
Sheng, W. and Li, X., “Multi-task learning for gait-based identity recognition and emotion recognition using attention enhanced temporal graph convolutional network,” Pattern Recognit. 114(1), 107868 (2021).CrossRefGoogle Scholar
Li, Z., Ren, Z., Zhao, K., Deng, C. and Feng, Y., “Human-cooperative control design of a walking exoskeleton for body weight support,” IEEE Trans. Ind. Inform. 16(5), 29852996 (2019).CrossRefGoogle Scholar
Li, Z., Xu, C., Wei, Q., Shi, C. and Su, C.-Y., “Human-inspired control of dual-arm exoskeleton robots with force and impedance adaptation,” IEEE Trans. Syst. Man Cybernet. Syst. 50(12), 52965305 (2018).CrossRefGoogle Scholar
Narayanan, V., Manoghar, B. M., Dorbala, V. S., Manocha, D. and Bera, A., “Proxemo: Gait-Based Emotion Learning and Multi-View Proxemic Fusion for Socially-Aware Robot Navigation,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2020) pp. 82008207.CrossRefGoogle Scholar
Xu, S., Fang, J., Hu, X., Ngai, E., Guo, Y., Leung, V., Cheng, J. and Hu, B., “Emotion recognition from gait analyses: Current research and future directions, arXiv preprint arXiv:2003.11461 (2020).Google Scholar
Bhattacharya, U., Rewkowski, N., Guhan, P., Williams, N. L., Mittal, T., Bera, A. and Manocha, D., “Generating Emotive Gaits for Virtual Agents Using Affect-Based Autoregression,” In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), (IEEE, 2020b) pp. 2435.CrossRefGoogle Scholar
Li, G., Li, Z. and Kan, Z., “Assimilation control of a robotic exoskeleton for physical human-robot interaction,” IEEE Robot. Automat. Lett. 7(2), 29772984 (2022).CrossRefGoogle Scholar
Peri, R., Parthasarathy, S., Bradshaw, C. and Sundaram, S., “Disentanglement for Audio-Visual Emotion Recognition Using Multitask Setup,” In: ICASSP, 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021) pp. 63446348.Google Scholar
Liang, J., Liu, Z., Zhou, J., Jiang, X., Zhang, C. and Wang, F., “Model-protected multi-task learning,” In: IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1002–1019 (2020).Google Scholar
Zhang, B., Provost, E. M. and Essl, G., “Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences,” IEEE Trans. Affect. Comput. 10(1), 8599 (2019).CrossRefGoogle Scholar
Yu, X., Xu, C., Zhang, X. and Ou, L., “Real-time multitask multihuman–robot interaction based on context awareness,” Robotica 40(9), 127 (2022).Google Scholar
Sheng, W. and Li, X., “Siamese denoising autoencoders for joints trajectories reconstruction and robust gait recognition,” Neurocomputing 395, 8694 (2020).CrossRefGoogle Scholar
Yi, L. and Mak, M.-W., “Improving speech emotion recognition with adversarial data augmentation network,” IEEE Trans. Neur. Netw. Learn. 33(1), 172–184 (2020).Google Scholar
Huang, C.-L., “Exploring Effective Data Augmentation with Tdnn-Lstm Neural Network Embedding for Speaker Recognition,” In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2019) pp. 291295.CrossRefGoogle Scholar
Bhattacharya, U., Mittal, T., Chandra, R., Randhavane, T., Bera, A. and Manocha, D., “Step: Spatial Temporal Graph Convolutional Networks for Emotion Perception From Gaits,” In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34a (2020a) pp. 13421350.CrossRefGoogle Scholar
Mirza, M. and Osindero, S.. Conditional Generative Adversarial Nets, arXiv: Learning (2014).Google Scholar
Sohn, K., Lee, H. and Yan, X., “Learning Structured Output Representation Using Deep Conditional Generative Models,” In: NIPS 2015 (2015) pp. 34833491.Google Scholar
Gao, J., Chakraborty, D., Tembine, H. and Olaleye, O., “Nonparallel Emotional Speech Conversion,” In: Interspeech (2019).Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T. and Efros, A. A., “Image-to-Image Translation with Conditional Adversarial Networks,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 11251134.Google Scholar
Zhu, J.-Y., Park, T., Isola, P. and Efros, A. A., “Unpaired Image-to-image Translation Using Cycle-Consistent Adversarial Networks,” In: IEEE International Conference on Computer Vision (ICCV) (2017) pp. 22422251.Google Scholar
Kim, T., Cha, M., Kim, H., Lee, J. K. and Kim, J., “Learning to Discover Cross-Domain Relations with Generative Adversarial Networks,” In: International Conference on Machine Learning (PMLR, 2017), pp. 18571865.Google Scholar
Huang, X., Liu, M.-Y., Belongie, S. and Kautz, J., “Multimodal Unsupervised Image-to-image Translation,” In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) pp. 172189.Google Scholar
Choi, Y., Uh, Y., Yoo, J. and Ha, J.-W., “Stargan v2: Diverse Image Synthesis for Multiple Domains,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) pp. 81888197.Google Scholar
Rizos, G., Baird, A., Elliott, M. and Schuller, B., “Stargan for Emotional Speech Conversion: Validated by Data Augmentation of End-to-end Emotion Recognition,” In: ICASSP, 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020) pp. 35023506.Google Scholar
Su, B.-H. and Lee, C.-C., “A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition,” In: IEEE Spoken Language Technology Workshop (SLT) (2021) pp. 351357.CrossRefGoogle Scholar
Zhu, Q., Gao, L., Song, H. and Mao, Q., “Learning to disentangle emotion factors for facial expression recognition in the wild,” Int. J. Intell. Syst. 36(6), 25112527 (2021).CrossRefGoogle Scholar
Schroff, F., Kalenichenko, D. and Philbin, J., “Facenet: A Unified Embedding for Face Recognition and Clustering,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) pp. 815823.Google Scholar
Kastaniotis, D., Theodorakopoulos, I., Theoharatos, C., Economou, G. and Fotopoulos, S., “A framework for gait-based recognition using kinect,” Pattern Recogn. Lett. 68, 327335 (2015).CrossRefGoogle Scholar
Kastaniotis, D., Theodorakopoulos, I., Economou, G. and Fotopoulos, S., “Gait based recognition via fusing information from euclidean and riemannian manifolds,” Pattern Recogn. Lett. 84, 245251 (2016).CrossRefGoogle Scholar
Bao, J., Chen, D., Wen, F., Li, H. and Hua, G., “CVAE-GAN: Fine-grained Image Generation Through Asymmetric Training,” In: IEEE International Conference on Computer Vision (ICCV) (2017) pp. 27642773.Google Scholar