Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-13T19:40:54.958Z Has data issue: false hasContentIssue false

CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Published online by Cambridge University Press:  27 February 2003

William L. Cooper
Affiliation:
Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, E-mail: billcoop@me.umn.edu
Shane G. Henderson
Affiliation:
School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY 14853, E-mail: shane@orie.cornell.edu
Mark E. Lewis
Affiliation:
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109-2117, E-mail: melewis@engin.umich.edu

Abstract

Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for computing optimal policies for Markov decision processes. At each iteration, rather than solving the average evaluation equations, SBPI employs simulation to estimate a solution to these equations. For recurrent average-reward Markov decision processes with finite state and action spaces, we provide easily verifiable conditions that ensure that simulation-based policy iteration almost-surely eventually never leaves the set of optimal decision rules. We analyze three simulation estimators for solutions to the average evaluation equations. Using our general results, we derive simple conditions on the simulation run lengths that guarantee the almost-sure convergence of the algorithm.

Type
Research Article
Copyright
© 2003 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)