Published online by Cambridge University Press: 14 July 2016
Suppose that π is a policy for resource allocation in a stochastic environment and π ∗ is an optimal policy. Two existing procedures for policy evaluation are described and compared. Both of these evaluate π by means of upper bounds on R(π ∗) – R(π), the total reward lost when making resource allocations according to π rather than π∗. The bounds developed by these two methods are called Type 1 and Type 2. We demonstrate by example that neither of these procedures dominates the other in the sense of always yielding tighter bounds. A modification to Type 2 bounds is proposed resulting in an improved procedure which always dominates the Type 1 approach.
During the course of this research the author was supported by the National Research Council as a Senior Research Associate at the Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943–5000, USA.