Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-10T22:16:17.863Z Has data issue: false hasContentIssue false

Influence of modeling structurein probabilistic sequential decisionproblems

Published online by Cambridge University Press:  12 October 2006

Florent Teichteil-Königsbuch
Affiliation:
ONERA-DCSD, 2 Avenue Édouard-Belin, 31055 Toulouse, France; e-mail: florent.teichteil@onera.fr; patrick.fabiani@onera.fr
Patrick Fabiani
Affiliation:
ONERA-DCSD, 2 Avenue Édouard-Belin, 31055 Toulouse, France; e-mail: florent.teichteil@onera.fr; patrick.fabiani@onera.fr
Get access

Abstract

Markov Decision Processes (MDPs) are a classical framework forstochastic sequential decision problems, based on an enumerated statespace representation. More compact and structured representations havebeen proposed: factorization techniques use state variablesrepresentations, while decomposition techniques are based on apartition of the state space into sub-regions and take advantage ofthe resulting structure of the state transition graph. We use a familyof probabilistic exploration-like planning problems in order to studythe influence of the modeling structure on the MDP solution. We firstdiscuss the advantages and drawbacks of a graph based representationof the state space, then present our comparisons of two decompositiontechniques, and propose to use a global approach combining both statespace factorization and decomposition techniques. On the explorationproblem instance, it is proposed to take advantage of the naturaltopological structure of the navigation space, which is partitionedinto regions. A number of local policies are optimized within eachregion, that become the macro-actions of the global abstract MDPresulting from the decomposition. The regions are the correspondingmacro-states in the abstract MDP. The global abstract MDP is obtainedin a factored form, combining all the initial MDP state variables andone macro-state “region” variable standing for the different possiblemacro-states corresponding to the regions. Further research ispresently conducted on efficient solution algorithms implementing thesame hybrid approach for tackling large size MDPs.

Type
Research Article
Copyright
© EDP Sciences, 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

D. Aberdeen, S. Thiébaux and L. Zhang, Decision-theoretic military operations planning, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 402–412.
R. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ (1957).
D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming: an overview (1995).
B. Bonet and H. Geffner, Labeled rtdp: Improving the convergence of real-time dynamic programming, in Proceedings of 13th Conf. ICAPS 2003, Trento, Italy (2003) 12–21.
C. Boutilier and D. Poole, Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, USA, AAAI Press / The MIT Press (1996) 1168–1175.
C. Boutilier, Correlated action effects in decision theoretic regression, in Uncertainty in Artificial Intelligence (1997) 30–37.
C. Boutilier, R.I. Brafman and C. Geib, Structured reachability analysis for Markov decision processes, in Uncertainty in Artificial Intelligence (1998) 24–32.
Boutilier, C., Dean, T. and Hanks, S., Decision-theoretic planning: Structural assumptions and computational leverage. J. Artificial Intell. Res. 11 (1999) 194.
Boutilier, C., Dearden, R. and Goldszmidt, M., Stochastic dynamic programming with factored representations. Artificial Intell. 121 (2000) 49107. CrossRef
A.R. Cassandra, Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Computer science, U. of Illinois, Providence R.I. (1998).
Dean, T. and Kanazawa, K., A model for reasoning about persistence and causation. Computational Intelligence 5 (1989) 142150. CrossRef
T. Dean and S.-H. Lin, Decomposition techniques for planning in stochastic domains, in Proceedings of the 14th IJCAI 1995, San Francisco, CA (1995) 1121–1129.
Dearden, R. and Boutilier, C., Abstraction and approximate decision-theoretic planning. Artificial Intell. 89 (1997) 219283. CrossRef
Dietterich, T.G., Hierarchical reinforcement learning using the maxq value function decomposition. J. Artificial Intell. Res. 13 (2000) 227303.
I.S. Duff, A survey of sparse matrix research, in Proceedings of the IEEE, Prentice Hall, New York 65 (1977) 500–535.
I.S. Duff, A.M. Erisman and J.K. Reid, Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986).
A. Dutech, Solving pomdp's using selected past events, in Proceedings of the 14th ECAI 2000, Berlin, Germany (July 2000) 281–285.
P. Fabiani and F. Teichteil-Königsbuch, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Reasoning under Uncertainty in Robotics - RUR'2005, Edinburgh, Scotland (2005).
Z. Feng and E. Hansen, Symbolic heuristic search for factored markov decision processes, in Proceedings of 18th Conf. AAAI 2002, Edmonton, Alberta, Canada (2002) 455–460.
Z. Feng, E.A. Hansen and S. Zilberstein, Symbolic generalization for on-line planning, in Proceedings of 19th Conf. UAI 2003, Acapulco, Mexico (2003) 209–216.
C. Guestrin, M. Hauskrecht and B. Kveton, Solving factored mdps with continuous and discrete variables, in Proceedings of 20th Conf. UAI 2004, Banff, Canada (2004).
Guestrin, C., Koller, D., Parr, R. and Venkataraman, S., Efficient solution algorithms for factored mdps. J. Artificial Intell. Res. 19 (2003) 399468.
Hansen, E.A. and Zilberstein, S., Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intell. 129 (2001) 3562. CrossRef
M. Hauskrecht, N. Meuleau, L.P. Kaelbling, T.L. Dean and C. Boutilier, Hierarchical solution of markov decision processes using macro-actions, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 220–229.
J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Spudd: Stochastic planning using decision diagrams, in Proceedings of 15th Conf. UAI 1999, San Francisco, CA (1999) 279–288.
J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Optimal and approximate stochastic planning using decision diagrams. Technical Report TR-2000-05, University of British Columbia, 10 (2000).
Kim, K.-E. and Dean, T., Solving factored mdps using non-homogeneous partitions. Artificial Intell. 147 (2003) 225251. CrossRef
B. Kveton and M. Hauskrecht, Heuristic refinements of approximate linear programming for factored continuous-state markov decision processes, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 306–314.
S.M. Lavalle, A Game-Theoretic Framework for Robot Motion Planning. Electrical engineering, University of Illinois, Urbana-Champaign (1995).
W.S. Lovejoy, A survey of algorithmic methods for partially observed markov decision processes. Technical Report 28, Annals of Operation Research (1991).
R. Parr, Flexible decomposition algorithms for weakly coupled markov decision problems, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 422–430.
J. Pineau, G. Gordon and S. Thrun, Policy-contingent abstraction for robust robot control, in Conference on Uncertainty in Articifical Intelligence (UAI) (2003) 477–484.
M.L. Puterman, Markov Decision Processes. John Wiley & Sons, INC (1994).
R.I. Bahar, E.A. Frohm, C.M. Gaona, G.D. Hachtel, E. Macii, A. Pardo and F. Somenzi, Algebraic Decision Diagrams and Their Applications, in IEEE /ACM International Conference on CAD, Santa Clara, California, 1993. IEEE Computer Society Press 188–191.
Y. Saad, Iterative Methods for Sparse Linear Systems. Society of Industrial and Applied Mathematics, second edition (2003).
R. Sabbadin, Graph partitioning techniques for markov decision processes decomposition, in Proceedings of the 15th ECAI 2002, Lyon, France (July 2002) 670–674.
R. St-Aubin, J. Hoey and C. Boutilier, APRICODD: Approximate policy construction using decision diagrams, in NIPS (2000) 1089–1095.
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998).
Teichteil-Königsbuch, F. and Fabiani, P., Processus Décisionnel de Markov décomposé et factorisé pour l'optimisation d'une stratégie d'exploration. Revue d'Intelligence Artificielle 20 (2006) 133-179. CrossRef
F. Teichteil-Königsbuch and P. Fabiani, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Planning under Uncertainty for Autonomous Systems ICAPS'2005, Monterey, CA, USA (2005).
G. Verfaillie, F. Garcia and L. Péret, Deployment and Maintenance of a Constellation of Satellites: a Benchmark, in Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, Trento, Italy (June 2003).