Optimal motion planning by reinforcement learning in autonomous mobile vehicles

M. Gómez; R. V. González; T. Martínez-Marín; D. Meziat; S. Sánchez

doi:10.1017/S0263574711000452

Optimal motion planning by reinforcement learning in autonomous mobile vehicles

Published online by Cambridge University Press: 19 May 2011

M. Gómez ,

D. Meziat and

M. Gómez*: Affiliation:
Departamento de Automática, Escuela Politécnica Superior, Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain
R. V. González: Affiliation:
Departamento de Automática, Escuela Politécnica Superior, Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain
T. Martínez-Marín: Affiliation:
Departamento de Física, Ingeniería de Sistemas y Teoría de la Señal, Universidad de Alicante, 03080 Alicante, Spain
D. Meziat: Affiliation:
Departamento de Automática, Escuela Politécnica Superior, Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain
S. Sánchez: Affiliation:
Departamento de Automática, Escuela Politécnica Superior, Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain
*: *Corresponding author. E-mail: mgomez@aut.uah.es

Article contents

Summary
References

Get access

Rights & Permissions

Summary

The aim of this work has been the implementation and testing in real conditions of a new algorithm based on the cell-mapping techniques and reinforcement learning methods to obtain the optimal motion planning of a vehicle considering kinematics, dynamics and obstacle constraints. The algorithm is an extension of the control adjoining cell mapping technique for learning the dynamics of the vehicle instead of using its analytical state equations. It uses a transformation of cell-to-cell mapping in order to reduce the time spent during the learning stage. Real experimental results are reported to show the satisfactory performance of the algorithm.

Keywords

Adjoining Cell-mapping Non-holonomic systems Optimal motion planning Reinforcement learning

Information

Type: Articles
Information: Robotica , Volume 30 , Issue 2 , March 2012 , pp. 159 - 170

DOI: https://doi.org/10.1017/S0263574711000452 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

1.Hsu, C. S. and Guttalu, R. S., “An unravelling algorithm for global analysis of dynamical systems: An application of cell-to-cell mapping,” J. Appl. Mech. 44, 940–948 (1980).CrossRef Google Scholar

2.Reeds, J. A. and Shepp, R. A., “Optimal paths for a car that goes both forwards and backwards,” Pac. J. Math. 145 (2), 364–393 (1990).CrossRef Google Scholar

3.Latombe, J.-C., Robot Motion Planning (Kluwer Academic, USA, 1991).CrossRef Google Scholar

4.Gómez, M., Martínez, T. and Sánchez, S., “Optimal Trajectory Generation Using the Simple Cell-Mapping Method for Wheeled Mobile Vehicles,” Proceedings of Seminario Annual de Automática, Electrónica Industrial e Instrumentación (SAAEI), Universidad de Alcalá, Spain (2002) pp. 1–5.Google Scholar

5.Lamiraux, F. and Laumond, J. P., “Smooth motion planning for car-like vehicles,” IEEE Trans. Robot. Autom. 14 (4), 498–502 (2001).CrossRef Google Scholar

6.Rankin, L. and Crane, C. D. III, “A Multi-purpose off-line Path Planner Based on an A* Search Algorithm,” Proceedings of the ASME Design Engineering Technical Conferences, Irvine, California (1996) pp. 1–10.Google Scholar

7.Belkhouche, F., “Reactive path planning in a dynamic environment,” IEEE Trans. Robot. 25 (4), 902–911 (2009).CrossRef Google Scholar

8.Qin, B., Soh, Y. C., Xie, M. and Wang, D., “Optimal Trajectory Generation for Wheeled Mobile Robot,” Proceedings of the 5th International Conference on Computer Integrated Manufacturing, Singapore (2000) vol. 1, pp. 434–444.Google Scholar

9.Qin, B., Soh, Y. C., Wang, D. and Xie, M., “Trajectory Generation for Velocity-Varying Wheeled Mobile Robot,” Proceedings of the 6th International Conference on Applications of Advanced Technologies in Transportation Engineering, Singapore (2000).Google Scholar

10.Wang, H., Chen, Y. and Souères, P., “A geometric algorithm to compute time-optimal trajectories for a bidirectional steered robot,” IEEE Trans. Robot. 25 (2), 399–413 (2009).CrossRef Google Scholar

11.Morin, P. and Samson, C., “Control of nonholonomic mobile robots based on the transverse function approach,” IEEE Trans. Robot. 25 (5), 1058–1073 (2009).CrossRef Google Scholar

12.Fraichard, T. and Ahuactzin, J. M., “Smooth Path Planning for Cars,” Proceedings of the IEEE International Conference on Robotics and Automation, Seoul, Korea (2001) pp. 3722–3727.Google Scholar

13.Barraquand, J. and Latombe, J. C., “On nonholonomic mobile robots and optimal maneuvering,” Revue d'Inteligence Artificielle 3 (2), 44–103 (1989).Google Scholar

14.Krödel, M. and Kuhnert, K-D., “Reinforcement Learning to Drive a Car by Pattern Matching,” Computer Science, Proceedings of the 24th DAGM Symposium on Pattern Recognition, Lecture Notes in Computer Science, vol. 2449, Springer, Berlin (2002), pp. 322–329.Google Scholar

15.Brunskill, E., Leffler, B. R., Li, L., Littman, M. L. and Roy, N., “Provably efficient learning with typed parametric models,” J. Mach. Learn. Res. 10, 1955–1988 (2009).Google Scholar

16.Hsu, S., “A discrete method of optimal control based upon the cell state space concept,” J. Optim. Theory Appl. 46 (4), 547–569 (1985).CrossRef Google Scholar

17.Papa, M., Tai, H. M. and Shenoi, S., “Cell Mapping for controller design and evaluation,” IEEE Control Syst. Mag. 17, 52–65 (1997).Google Scholar

18.Martínez-Marín, T., Diseño de controladores óptimos combinando técnicas de Cell-Mapping y redes neuronales para el control de sistemas dinámicos no lineales Ph.D. Dissertation (Madrid, Spain: Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, 1999).Google Scholar

19.Song, F. and Smith, S. M., “Cell-state-space-based search,” IEEE Control Syst. Mag. 22 (4), 42–56 (2002).Google Scholar

20.Zufiria, P. J. and Martínez-Marín, T., “Improved optimal control methods based upon the adjoining cell mapping technique,” J. Optim. Theory Appl. 118 (3), 654–680 (2003).CrossRef Google Scholar

21.Gómez, M., Martínez, T., Sánchez, S. and Meziat, D., “Optimal control applied to Wheeled Mobile Vehicles,” Proceedings of the IEEE International Symposium on Intelligent Signal Processing, Universidad de Alcalá, Spain (2007) pp. 83–88.Google Scholar

22.Gómez, M., Martínez-Marín, T., Sánchez, S. and Meziat, D., “Optimal Control Applied for Wheeled Mobile Vehicles Based on Cell Mapping Techniques,” Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven University of Technology, The Netherlands (2008) pp. 1009–1014.Google Scholar

23.Hsieh, M. F. and Özgüner, Ü., “A Parking Algorithm for Autonomous Vehicle,” Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven University of Technology, The Netherlands (2008) pp. 1155–1160.Google Scholar

24.Jung, H. G., Kim, D. S., Yoon, P. J. and Kim, J., “Two-Touch Type Parking Slot Marking Recognitions for Target Parking Position Designation,” Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven University of Technology, The Netherlands (2008) pp. 1161–1166.Google Scholar

25.Fu, L., Yazici, A. and Özgüner, Ü., “Route Planning for OSU-ACT Autonomous Vehicle in DARPA Urban Challenge,” Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven University of Technology, The Netherlands (2008) pp. 781–786.Google Scholar

26.Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P., Gale, J., Halpenny, M., Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., Stang, P., Strohband, S., Dupont, C., Jendrossek, L.-E., Koelen, C., Markey, C., Rummel, C., van Niekerk, J., Jensen, E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Kaehler, A., Nefian, A. and Mahoney, P., “Stanley, the robot that won the DARPA grand challenge,” J. Field Robot. 23 (9), 661–692 (2006).CrossRef Google Scholar

27.Kushner, H. J. and Dupuis, P., “Numerical Methods for Stochastic Control Problems in Continuous Time,” In: Applications of Mathematics (Springer-Verlag, Berlin, 1992).Google Scholar

28.Munos, R. and Moore, A., “Variable Resolution Discretization in Optimal Control,” Mach. Learn. 49, 291–323 (2002).CrossRef Google Scholar

29.Gómez, M., Martínez-Marín, T., Sánchez, S. and Meziat, D., “Integration of cell-mapping and reinforcement learning techniques for motion planning of car-like robots,” IEEE Trans. Instrum. Meas. (Special Issue) 58 (9), 3094–3103 (2009).CrossRef Google Scholar

30.Gómez, M., Gayarre, L., Martínez-Marín, T., Sánchez, S. and Meziat, D., “Motion Planning of a Non-Holonomic Vehicle in a Real Environment by Reinforcement Learning,” Proceedings of the International Work-Conference on Artificial Neural Networks (IWANN '09), Lecture Notes in Computer Science, vol. 5514, Springer, Berlin (2009), pp. 813–819.Google Scholar

31.Sutton, R. S., “Reinforcement Learning Architectures,” Proceedings of the International Symposium on Neural Information Processing (ISKIT '92), Fukuoka, Japan (1992).Google Scholar

32.Sutton, R. S. and Barto, A., Reinforcement Learning: An Introduction (MIT Press, USA, 1998).Google Scholar

33.Zufiria, P. J. and Guttalu, R. S., “The adjoining cell mapping and its recursive unravelling. Part I: Description of adaptive and recursive algorithms,” Nonlinear Dyn. 4 (4), 204–226 (1993).CrossRef Google Scholar

34.Watkins, C. J. C. H., Learning from Delayed Rewards Ph.D. Dissertation (Cambridge, England: Cambridge University, 1989).Google Scholar

35.Kaebling, L. P., Littman, M. L. and Moore, A. W., “Reinforcement learning: A survey,” Artif. Intell. Res. 4, 234–285 (1996).Google Scholar

36.Watkins, C. J. C. H. and Peter, D., “Technical note: Q-learning,” Mach. Learn. 8, 249–292 (1992).CrossRef Google Scholar

37.Bertsekas, D. P. and Tsitsiklis, J., Neuro-Dynamic Programming (Athenea Scientific, USA, 1996).Google Scholar

38.Bellman, R. E., Dynamic programming (Princeton University Press, USA, 1954).Google Scholar PubMed

39.Moore, and Atkeson, C., “Prioritized sweeping: Reinforcement learning with less data and less time,” Mach. Learn. 13, 103–130 (1993).CrossRef Google Scholar

40.Sutton, R. S., “First Results with Dyna, an Integrated Architecture for Learning, Planning, and Reacting,” In: Neural Networks for Control (MIT Press, 1990).Google Scholar

41.Gómez, M., “Website of Research Works Related to Motion Optimal Planning and Reinforcement Learning Applied to Autonomous Mobile Vehicles,” Videos Online: http://atc1.aut.uah.es/~mariano/Research/OptimalControl_research.html (2010).Google Scholar

Article contents

Optimal motion planning by reinforcement learning in autonomous mobile vehicles

Summary

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests