Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-10T12:43:15.917Z Has data issue: false hasContentIssue false

Selection of trajectory parameters for dynamic pouring tasks based on exploitation-driven updates of local metamodels

Published online by Cambridge University Press:  08 May 2017

Joshua D. Langsfeld
Affiliation:
Maryland Robotics Center, Institute for Systems Research, University of Maryland, College Park, MD, USA. E-mail: jdlangs@umd.edu
Krishnanand N. Kaipa
Affiliation:
Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA, USA. E-mail: kkaipa@odu.edu
Satyandra K. Gupta*
Affiliation:
Center for Advanced Manufacturing, Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA, USA
*
*Corresponding author. E-mail: guptask@usc.edu

Summary

We present an approach that allows a robot to generate trajectories to perform a set of instances of a task using few physical trials. Specifically, we address manipulation tasks which are highly challenging to simulate due to complex dynamics. Our approach allows a robot to create a model from initial exploratory experiments and subsequently improve it to find trajectory parameters to successfully perform a given task instance. First, in a model generation phase, local models are constructed in the vicinity of previously conducted experiments that explain both task function behavior and estimated divergence of the generated model from the true model when moving within the neighborhood of each experiment. Second, in an exploitation-driven updating phase, these generated models are used to guide parameter selection given a desired task outcome and the models are updated based on the actual outcome of the task execution. The local models are built within adaptively chosen neighborhoods, thereby allowing the algorithm to capture arbitrarily complex function landscapes. We first validate our approach by testing it on a synthetic non-linear function approximation problem, where we also analyze the benefit of the core approach features. We then show results with a physical robot performing a dynamic fluid pouring task. Real robot results reveal that the correct pouring parameters for a new pour volume can be learned quite rapidly, with a limited number of exploratory experiments.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Aboaf, E., Atkeson, C. G. and Reinkensmeyer, D. J., Task Level Robot Learning: Ball Throwing. Technical report, MIT, Cambridge, MA, 1987.CrossRefGoogle Scholar
2. Abu-Dakka, F. J., Valero, F. J., Suner, J. Luis and A, V., “Mata direct approach to solving trajectory planning problems using genetic algorithms with dynamics considerations in complex environments,” Robotica 33 (3), 669683 (2015).Google Scholar
3. Akgun, B., Cakmak, M., Jiang, K. and Thomaz, A. L., “Keyframe-based learning from demonstration,” Int. J. Soc. Robot. 4 (4), 343355 (2012).Google Scholar
4. Al-Shuka, H. F. N., Corves, B. and Zhu, W.-H., “Function approximation technique-based adaptive virtual decomposition control for a serial-chain manipulator,” Robotica 32 (3), 375399 (2014).Google Scholar
5. Arif, M., Ishihara, T. and Inooka, H., “Incorporation of experience in iterative learning controllers using locally weighted learning,” Automatica 37 (6), 881888 (2001).CrossRefGoogle Scholar
6. Atkeson, C. G., Moore, A. W. and Schaal, S., “Locally weighted learning,” Artif. Intell. 11, 1173 (1997).CrossRefGoogle Scholar
7. Berenson, D., Abbeel, P. and Goldberg, K., “A Robot Path Planning Framework that Learns from Experience,” Proceedings of the International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA (2012) pp. 3671–3678.Google Scholar
8. Bocsi, B., Csato, L. and Peters, J., “Alignment-Based Transfer Learning for Robot Models,” Proceedings of the International Joint Conference on Neural Networks, Dallas, TX (2013) pp. 1–7.Google Scholar
9. Bowen, C., Ye, G. and Alterovitz, R., “Asymptotically optimal motion planning for learned tasks using time-dependent cost maps,” IEEE Trans. Autom. Sci. Eng. 12 (1), 171182 (2015).Google Scholar
10. Brandi, S., Kroemer, O. and Peters, J., “Generalizing Pouring Actions Between Objects using Warped Parameters,” Proceedings of the 14th IEEE-RAS International Conference on Humanoid Robots, Madrid, Spain (2014) pp. 616–621.Google Scholar
11. Branicky, M. S., Knepper, R. A. and Kuffner, J. J., “Path and Trajectory Diversity: Theory and Algorithms,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA (2008) pp. 1359–1364.Google Scholar
12. Broun, A., Beck, C., Pipe, T., Mirmehdi, M. and Melhuish, C., “Bootstrapping a robot's kinematic model,” Robot. Auton. Syst. 62 (3), 330339 (2014).Google Scholar
13. Castro da Silva, B., Konidaris, G. and Barto, A. G., “Learning Parameterized Skills,” Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland (2012) pp. 1679–1686.Google Scholar
14. Deisenroth, M. P. and Rasmussen, C. E., “PILCO: A Model-Based and Data-Efficient Approach to Policy Search,” Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (2011) pp. 465–472.Google Scholar
15. El-Fakdi, A. and Carreras, M., “Two-step gradient-based reinforcement learning for underwater robotics behavior learning,” Robotics and Autonomous Systems 61 (3), 271282 (2013).Google Scholar
16. Esfandiar, H., Daneshmand, S. and Kermani, R. D., “On the control of a single flexible arm robot via Youla-Kucera parameterization,” Robotica 34 (01), 150172 (2016).Google Scholar
17. Grollman, D. H. and Jenkins, O. C., “Sparse Incremental Learning for Interactive Robot Control Policy Estimation,” Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, CA, USA (2008) pp. 3315–3320.Google Scholar
18. Jamone, L., Damas, B. and Santos-Victor, J., “Incremental Learning of Context-Dependent Dynamic Internal Models for Robot Control,” Proceedings of the IEEE International Symposium on Intelligent Control (ISIC), Antibes, France (2014) pp. 1336–1341.Google Scholar
19. Kakade, S. M., Kearns, M. J. and Langford, J., “Exploration in Metric State Spaces,” Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, D.C., USA (2003) pp. 306–312.Google Scholar
20. Kim, B., Kim, A., Dai, H., Kaelbling, L. and Lozano-perez, T., “Generalizing over Uncertain Dynamics for Online Trajectory Generation,” Proceedings of the International Symposium on Robotics Research (ISRR), Sestri Levante, Italy (2015) pp. 1–16.Google Scholar
21. Kober, J., Wilhelm, A., Oztop, E. and Peters, J., “Reinforcement learning to adjust parametrized motor primitives to new situations,” Auton. Robots 33, 361379 (2012).Google Scholar
22. Lehnert, C. and Wyeth, G., “Locally Weighted Learning Model Predictive Control for Nonlinear and Time Varying Dynamics,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany (2013) pp. 2619–2625.Google Scholar
23. Lovell, C., Jones, G., Zauner, K.-P. and Gunn, S. R., “Exploration and Exploitation with Insufficient Resources,” JMLR: Workshop and Conference Proceedings, Bellevue, WA, USA, vol. 26 (2012) pp. 37–61.Google Scholar
24. Luo, J. and Hauser, K., “Robust Trajectory Optimization Under Frictional Contact with Iterative Learning,” Lydia E. Kavraki, David Hsu, and Jonas Buchli, editors. Robotics: Science and Systems (RSS), Rome, Italy (2015) ISBN 978-0-9923747-1-6.Google Scholar
25. Mihalkova, L. and Mooney, R., “Using Active Relocation to Aid Reinforcement Learning,” Proceedings of the 19th International FLAIRS Conference, Melbourne Beach, FL, USA (2006) pp. 580–585.Google Scholar
26. Mihai Moldovan, T., Levine, S., Jordan, M. I. and Abbeel, P., “Optimism-Driven Exploration for Nonlinear Systems,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA (2015) pp. 3239–3246.Google Scholar
27. Mordatch, I. and Todorov, E., “Combining the Benefits of Function Approximation and Trajectory Optimization,” Dieter Fox, Lydia E. Kavraki and Hanna Kurniawati, editors. Robotics: Science and Systems (RSS), Berkeley, CA USA (2014) ISBN 978-0-9923747-0-9.Google Scholar
28. Nemec, B., Forte, D., Vuga, R., Tamosiunaite, M., Worgotter, F. and Ude, A., “Applying Statistical Generalization to Determine Search Direction for Reinforcement Learning of Movement Primitives,” IEEE-RAS International Conference on Humanoid Robots, Osaka, Japan (2012) pp. 65–70.Google Scholar
29. Nguyen-Tuong, D. and Peters, J., “Model learning for robot control: A survey,” Cognitive Science 12 (4), 319–40 (2011).Google Scholar
30. Pajak, G. and Pajak, I., “Sub-optimal trajectory planning for mobile manipulators,” Robotica 33 (06), 11811200 (2015).CrossRefGoogle Scholar
31. Park, C., Pan, J. and Manocha, D., “High-DOF robots in dynamic environments using incremental trajectory optimization,” Int. J. Humanoid Robot. 11 (02) (2014).CrossRefGoogle Scholar
32. Pastor, P., Hoffmann, H., Asfour, T. and Schaal, S., “Learning and Generalization of Motor Skills by Learning from Demonstration,” Proceedings of the IEEE International Conference on Robotics and Automation, ICRA '09, Kobe, Japan (May 2009) pp. 763–768.Google Scholar
33. Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Netw. 21, 682697 (2008).CrossRefGoogle ScholarPubMed
34. Petrič, T., Gams, A., Žlajpah, L. and Ude, A., “Online Learning of Task-Specific Dynamics for Periodic Tasks,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA (2014) pp. 1790–1795.Google Scholar
35. Posa, M. and Tedrake, R., “Direct Trajectory Optimization of Rigid Body Dynamical Systems Through Contact,” In: Algorithmic Foundations of Robotics X (Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D., eds.), volume 86 (Springer Berlin Heidelberg, 2013) pp. 527542.Google Scholar
36. Rasmussen, C. E. and Williams, C. K. I., Gaussian Processes for Machine Learning (MIT Press, Boston, Massachusetts, United States, 2006).Google Scholar
37. Rosales, C., Ajoudani, A., Gabiccini, M. and Bicchi, A., “Active Gathering of Frictional Properties from Objects,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA (Sep. 2014) pp. 3982–3987.Google Scholar
38. Rozo, L., Jimenez, P. and Torras, C., “Force-Based Robot Learning of Pouring Skills using Parametric Hidden Markov Models,” International Workshop on Robot Motion and Control, RoMoCo, Wasowo, Poland (Jul. 2013) pp. 227232.Google Scholar
39. Schulman, J., Levine, S., Jordan, M. and Abbeel, P., “Trust Region Policy Optimization,” Proceedings of the International Conference on Machine Learning (ICML), Lille, France (2015) pp. 1889–1897.Google Scholar
40. Srinivas, N., Krause, A., Kakade, S. M. and Seeger, M., “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design,” Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel (2010) pp. 1015–1022.Google Scholar
41. Tamosiunaite, M., Nemec, B., Ude, A. and Wörgötter, F., “Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives,” Robot. Auton. Syst. 59 (11), 910922 (2011).Google Scholar
42. Theodorou, E., Buchli, J. and Schaal, S., “Learning Policy Improvements with Path Integrals,” International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy (2010).Google Scholar
43. Zhang, Y., Luo, J. and Hauser, K., “Sampling-Based Motion Planning with Dynamic Intermediate State Objectives: Application to Throwing,” IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA (2012) pp. 2551–2556.Google Scholar