Hostname: page-component-5b777bbd6c-f9nfp Total loading time: 0 Render date: 2025-06-18T21:37:16.941Z Has data issue: false hasContentIssue false

Manipulate as human: learning task-oriented manipulation skills by adversarial motion priors

Published online by Cambridge University Press:  11 June 2025

Ziqi Ma
Affiliation:
ParisTech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, P.R. China
Changda Tian
Affiliation:
Department of Automation, Shanghai Jiao Tong University, Shanghai, P.R. China
Yue Gao*
Affiliation:
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, P.R. China. Shanghai Innovation Institute, Shanghai, P.R. China
*
Corresponding author: Yue Gao; Email: yuegao@sjtu.edu.cn

Abstract

In recent years, there has been growing interest in developing robots and autonomous systems that can interact with human in a more natural and intuitive way. One of the key challenges in achieving this goal is to enable these systems to manipulate objects and tools in a manner that is similar to that of humans. In this paper, we propose a novel approach for learning human-style manipulation skills by using adversarial motion priors, which we name HMAMP. The approach leverages adversarial networks to model the complex dynamics of tool and object manipulation and the aim of the manipulation task. The discriminator is trained using a combination of real-world data and simulation data executed by the agent, which is designed to train a policy that generates realistic motion trajectories that match the statistical properties of human motion. We evaluated HMAMP on one challenging manipulation task: hammering, and the results indicate that HMAMP is capable of learning human-style manipulation skills that outperform current baseline methods. Additionally, we demonstrate that HMAMP has potential for real-world applications by performing real robot arm hammering tasks. In general, HMAMP represents a significant step towards developing robots and autonomous systems that can interact with humans in a more natural and intuitive way, by learning to manipulate tools and objects in a manner similar to how humans do.

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ainetter, S. and Fraundorfer, F., “End-to-End Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB,” 2021 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021) pp. 1345213458.10.1109/ICRA48506.2021.9561398CrossRefGoogle Scholar
Fang, K., Zhu, Y., Garg, A., Kurenkov, A., Mehta, V., Fei-Fei, L. and Savarese, S., “Learning task-oriented grasping for tool manipulation from simulated self-supervision,” Int. J. Robot. Res. 39(2-3), 202216 (2020).CrossRefGoogle Scholar
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V. and Levine, S., Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv: 1806.10293 (2018).Google Scholar
Qin, Z., Fang, K., Zhu, Y., Fei-Fei, L. and Savarese, S., “Keto: Learning Keypoint Representations for Tool Manipulation,” 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 72787285.Google Scholar
Manuelli, L., Gao, W., Florence, P. and Tedrake, R., “kpam: Keypoint Affordances for Category-Level Robotic Manipulation,” Robotics Research: The 19th International Symposium ISRR (Springer, 2022) pp. 132157.10.1007/978-3-030-95459-8_9CrossRefGoogle Scholar
Turpin, D., Wang, L., Tsogkas, S., Dickinson, S. and Garg, A., Gift: Generalizable interaction-aware functional tool affordances without labels. arXiv preprint arXiv: 2106.14973 (2021).Google Scholar
Edmonds, M., Gao, F., Liu, H., Xie, X., Qi, S., Rothrock, B., Zhu, Y., Wu, Y. N., Lu, H. and Zhu, S.-C., “A tale of two explanations: Enhancing human trust by explaining robot behavior,” Sci. Robot. 4(37), eaay4663 (2019).10.1126/scirobotics.aay4663CrossRefGoogle ScholarPubMed
Liu, H., Zhang, C., Zhu, Y., Jiang, C. and Zhu, S.-C., “Mirroring Without Overimitation: Learning Functionally Equivalent Manipulation Actions,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33 (2019) pp. 80258033.Google Scholar
Zhang, Z., Jiao, Z., Wang, W., Zhu, Y., Zhu, S.-C. and Liu, H., “Understanding physical effects for effective tool-use,” IEEE Robot. Autom. Lett. 7(4), 94699476 (2022).10.1109/LRA.2022.3191793CrossRefGoogle Scholar
Johns, E., “Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration,” 2021 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021) pp. 46134619.Google Scholar
Wang, C., Fan, L., Sun, J., Zhang, R., Fei-Fei, L., Xu, D., Zhu, Y. and Anandkumar, A., Mimicplay: Long-horizon imitation learning by watching human play (2023).Google Scholar
Zorina, K., Carpentier, J., Sivic, J. and Petrík, V., Learning to manipulate tools by aligning simulation to video demonstration (2021).Google Scholar
Zhang, T., McCarthy, Z., Jow, O., Lee, D., Chen, X., Goldberg, K. and Abbeel, P., “Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation,” 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018) pp. 56285635.CrossRefGoogle Scholar
Peng, X. B., Ma, Z., Abbeel, P., Levine, S. and Kanazawa, A., “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Trans. Graphics (TOG) 40(4), 120 (2021).CrossRefGoogle Scholar
Sanz, C., Call, J. and Boesch, C., Tool Use in Animals: Cognition and Ecology (Cambridge University Press, 2013).Google Scholar
St Amant, R. and Horton, T. E., “Revisiting the definition of animal tool use,” Anim. Behav. 75(4), 11991208 (2008).Google Scholar
Van Lawick-Goodall, J., “Tool-Using in Primates and Other Vertebrates,” In: Advances in the Study of Behavior, vol. 3 (Academic Press, 1971) pp. 195249.Google Scholar
Chen, W., Liang, H., Chen, Z., Sun, F. and Zhang, J., Learning 6-dof task-oriented grasp detection via implicit estimation and visual affordance (2022).Google Scholar
Xu, R., Chu, F.-J., Tang, C., Liu, W. and Vela, P. A., “An affordance keypoint detection network for robot manipulation,” IEEE Robot. Autom. Lett. 6(2), 28702877 (2021).CrossRefGoogle Scholar
Murali, A., Liu, W., Marino, K., Chernova, S. and Gupta, A., “Same object, different grasps: Data and semantic knowledge for task-oriented grasping. CoRR abs/2011.06431 (2020).Google Scholar
Al-Shanoon, A. and Lang, H., “Robotic manipulation based on 3-d visual servoing and deep neural networks,” Robot. Auton. Syst. 152(C), 104041 (2022).CrossRefGoogle Scholar
Ribeiro, E. G., de Queiroz Mendes, R. and Grassi, V., “Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation,” Robot. Auton. Syst. 139(C), 103757 (2021).Google Scholar
Saito, N., Ogata, T., Funabashi, S., Mori, H. and Sugano, S., “How to select and use tools? : Active perception of target objects using multimodal deep learning. CoRR abs/2106.02445 (2021).Google Scholar
Sun, M. and Gao, Y., “Gater: Learning grasp-action-target embeddings and relations for task-specific grasping,” IEEE Robot. Autom. Lett. 7(1), 618625 (2022).10.1109/LRA.2021.3131378CrossRefGoogle Scholar
Nair, S., Rajeswaran, A., Kumar, V., Finn, C. and Gupta, A., “R3m: A Universal Visual Representation for Robot Manipulation,” 6th Annual Conference on Robot Learning (2022).Google Scholar
Xiong, H., Fu, H., Zhang, J., Bao, C., Zhang, Q., Huang, Y., Xu, W., Garg, A. and Lu, C., “Robotube: Learning Household Manipulation from Human Videos with Simulated Twin Environments,” 6th Annual Conference on Robot Learning (2022).Google Scholar
Taheri, O., Ghorbani, N., Black, M. J. and Tzionas, D., GRAB: A dataset of whole-body human grasping of objects. CoRR abs/2008.11200 (2020).CrossRefGoogle Scholar
Xiong, H., Li, Q., Chen, Y., Bharadhwaj, H., Sinha, S. and Garg, A., Learning by watching: Physical imitation of manipulation skills from human videos. CoRR abs/2101.07241 (2021).CrossRefGoogle Scholar
Ho, J. and Ermon, S., “Generative Adversarial Imitation Learning,” Advances in Neural Information Processing Systems 29 (2016).Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., “Generative adversarial networks,” Commun. ACM 63(11), 139144 (2020).CrossRefGoogle Scholar
Escontrela, A., Peng, X. B., Yu, W., Zhang, T., Iscen, A., Goldberg, K. and Abbeel, P., “Adversarial Motion Priors make Good Substitutes for Complex Reward Functions,” 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2022) pp. 2532.CrossRefGoogle Scholar
Mao, X., Li, Q., Xie, H., Lau, R. Y. K. and Wang, Z., Multi-class generative adversarial networks with the L2 loss function. CoRR abs/1611.04076 (2016).Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R., Zhen, W. and Smolley, S., “On the effectiveness of least squares generative adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 29472960 (2018).CrossRefGoogle ScholarPubMed
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F. and Grundmann, M., Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv: 2006.10204 (2020).Google Scholar
Geng, T., Lee, M. and Hülse, M., “Transferring human grasping synergies to a robot,” Mechatronics 21(1), 272284 (2011).CrossRefGoogle Scholar
Gioioso, G., Salvietti, G., Malvezzi, M. and Prattichizzo, D., “Mapping synergies from human to robotic hands with dissimilar kinematics: An approach in the object domain,” IEEE Trans. Robot. 29(4), 825837 (2013).CrossRefGoogle Scholar
Suárez, R., Rosell, J. and García, N., “Using Synergies in Dual-Arm Manipulation Tasks,” 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015) pp. 56555661.Google Scholar
N. Physics Simulation Environment for Reinforcement Learning Research, Isaac gym - preview release (2023). https://developer.nvidia.com/isaac-gym.Google Scholar
Aronov, B., Har-Peled, S., Knauer, C., Wang, Y. and Wenk, C., “Fréchet Distance for Curves, Revisited,” Algorithms–ESA 2006: 14th Annual European Symposium, Zurich, Switzerland, September 11-13, 2006. Proceedings 14 (Springer, 2006) pp. 5263.Google Scholar
Supplementary material: File

Ma et al. supplementary material 1

Ma et al. supplementary material
Download Ma et al. supplementary material 1(File)
File 51 MB
Supplementary material: File

Ma et al. supplementary material 2

Ma et al. supplementary material
Download Ma et al. supplementary material 2(File)
File 36.2 MB