We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The heart of prospect theory is the value function, proposing that the carriers of value are positive or negative changes from a reference point. Daniel Kahneman observed that if prospect theory had a flag, the value function would be drawn on it. The function is nonlinear, reflecting diminishing sensitivity to magnitude. When describing how human lives are valued, the function exposes profound incoherence. An individual life is highly valued and thus vigorously protected if it is the only life at risk. But that life loses its value when it is one of many endangered by a larger tragedy. Beyond this insensitivity, the function may actually decline when many lives at risk become mere numbers. The more who die, the less we care. Implications of this deadly ‘arithmetic of compassion’ for understanding and managing the risk from nuclear weapons are briefly discussed.
Reinforcement learning (RL) is a computational framework for an active agent to learn behaviors on the basis of a scalar reward feedback. The theory of reinforcement learning was developed in the artificial intelligence community with intuitions from psychology and animal learning theory and mathematical basis in control theory. It has been successfully applied to tasks like game playing and robot control. Reinforcement learning gives a theoretical account of behavioral learning in humans and animals and underlying brain mechanisms, such as dopamine signaling and the basal ganglia circuit. Reinforcement learning serves as the “common language” for engineers, biologists, and cognitive scientists to exchange their problems and findings in goal-directed behaviors. This chapter introduces the basic theoretical framework of reinforcement learning and reviews its impacts in artificial intelligence, neuroscience, and cognitive science.
In the economics literature, there are two dominant approaches for solving models with optimal experimentation (also called active learning). The first approach is based on the value function and the second on an approximation method. In principle the value function approach is the preferred method. However, it suffers from the curse of dimensionality and is only applicable to small problems with a limited number of policy variables. The approximation method allows for a computationally larger class of models, but may produce results that deviate from the optimal solution. Our simulations indicate that when the effects of learning are limited, the differences may be small. However, when there is sufficient scope for learning, the value function solution seems more aggressive in the use of the policy variable.
This paper pioneers a Freidlin–Wentzell approach to stochastic impulse control of exchange rates when the central bank desires to maintain a target zone. Pressure to stimulate the economy forces the bank to implement diffusion monetary policy involving Freidlin–Wentzell perturbations indexed by a parameter ε∈ [0,1]. If ε=0, the policy keeps exchange rates in the target zone for all times t≥0. When ε>0, exchange rates continually exit the target zone almost surely, triggering central bank interventions which force currencies back into the zone or abandonment of all targets. Interventions and target zone deviations are costly, motivating the bank to minimize these joint costs for any ε∈ [0,1]. We prove convergence of the value functions as ε→0 achieving a value function approximation for small ε. Via sample path analysis and cost function bounds, intervention followed by target zone abandonment emerges as the optimal policy.
We prove the continuity and the Hölder equivalence w.r.t. an Euclidean distance of the value function associated with the L1 cost of the control-affine system q̇ = f0(q) + ∑j=1m ujfj(q), satisfying the strong Hörmander condition. This is done by proving a result in the same spirit as the Ball–Box theorem for driftless (or sub-Riemannian) systems. The techniques used are based on a reduction of the control-affine system to a linear but time-dependent one, for which we are able to define a generalization of the nilpotent approximation and through which we derive estimates for the shape of the reachable sets. Finally, we also prove the continuity of the value function associated with the L1 cost of time-dependent systems of the form q̇ = ∑j=1m uj fjt(q).
We consider a single-server queue with Poisson input operating under first-come–first-served (FCFS) or last-come–first-served (LCFS) disciplines. The service times of the customers are independent and obey a general distribution. The system is subject to costs for holding a customer per unit of time, which can be customer specific or customer class specific. We give general expressions for the corresponding value functions, which have elementary compact forms, similar to the Pollaczek–Khinchine mean value formulae. The results generalize earlier work where similar expressions have been obtained for specific service time distributions. The obtained value functions can be readily applied to develop nearly optimal dispatching policies for a broad range of systems with parallel queues, including multiclass scenarios and the cases where service time estimates are available.
This paper presents a closed-form characterization of the allocation of resources in an overlapping generations model of two-sided, partial altruism. Three assumptions are made: (i) parents and children play Markov strategies, (ii) utility takes the CRRA form, and (iii) the income of children is stochastic but proportional to the saving of parents. In families where children are rich relative to their parents, saving rates—measured as a function of the family's total resources—are higher than when children are poor relative to their parents. Income redistribution from the old to the young, therefore, leads to an increase in aggregate saving.
A new approach to the solution of optimal stopping problems for one-dimensional diffusions is developed. It arises by imbedding the stochastic problem in a linear programming problem over a space of measures. Optimizing over a smaller class of stopping rules provides a lower bound on the value of the original problem. Then the weak duality of a restricted form of the dual linear program provides an upper bound on the value. An explicit formula for the reward earned using a two-point hitting time stopping rule allows us to prove strong duality between these problems and, therefore, allows us to either optimize over these simpler stopping rules or to solve the restricted dual program. Each optimization problem is parameterized by the initial value of the diffusion and, thus, we are able to construct the value function by solving the family of optimization problems. This methodology requires little regularity of the terminal reward function. When the reward function is smooth, the optimal stopping locations are shown to satisfy the smooth pasting principle. The procedure is illustrated using two examples.
This paper introduces a method of optimization in infinite-horizon economies based on the theory of correspondences. The proposed approach allow us to study time-separable and non-time-separable dynamic economic models without resorting to fixed point theorems or transversality conditions. When our technique is applied to the standard time-separable model it provides an alternative and straightforward way to derive the common recursive formulation of these models by means of Bellman equations.
An alternative approach for the analysis and the numericalapproximation of ODEs, using a variational framework, ispresented. It is based on the natural and elementary idea of minimizing the residual of the differential equation measured in a usual Lp norm.Typical existence results for Cauchy problems can thus berecovered, and finer sets of assumptions for existence are made explicit. We treat, in particular, the cases of an explicit ODE and a differential inclusion. This approach also allows for a whole strategy to approximate numerically the solution. It is briefly indicated here as it will be pursued systematically and in a much more broad fashion in a subsequent paper.
We consider a single-server queueing system at which customers arrive according to a Poisson process. The service times of the customers are independent and follow a Coxian distribution of order r. The system is subject to costs per unit time for holding a customer in the system. We give a closed-form expression for the average cost and the corresponding value function. The result can be used to derive nearly optimal policies in controlled queueing systems in which the service times are not necessarily Markovian, by performing a single step of policy iteration. We illustrate this in the model where a controller has to route to several single-server queues. Numerical experiments show that the improved policy has a close-to-optimal value.
We consider an optimal control problem of Mayer type and prove that,under suitable conditions on the system, the value function isdifferentiable along optimal trajectories, except possibly at theendpoints. We provide counterexamples to show that this property may failto hold if some of our conditions are violated. We then apply our regularityresult to derive optimality conditions for the trajectories of the system.
We describe an algorithm for computing the value function for “all source, single destination” discrete-time nonlinear optimal control problems together with approximations of associated globally optimal control strategies. The method is based on a set oriented approach for the discretization of the problem in combination with graph-theoretic techniques. The central idea is that a discretization of phase space of the given problem leads to an (all source, single destination) shortest path problem on a finite graph. The method is illustrated by two numerical examples, namely a single pendulum on a cart and a parametrically driven inverted double pendulum.
We study the multiserver queue with Poisson arrivals and identical independent servers with exponentially distributed service times. Customers arriving at the system are admitted or rejected according to a fixed threshold policy. Moreover, the system is subject to holding, waiting, and rejection costs. We give a closed-form expression for the average costs and the value function for this multiserver queue. The result will then be used in a single step of policy iteration in the model where a controller has to route to several finite-buffer queues with multiple servers. We numerically show that the improved policy has a close to optimal value.
The maximality principle [6] is shown to be valid in some examples of discounted optimal stopping problems for the maximum process. In each of these examples explicit formulas for the value functions are derived and the optimal stopping times are displayed. In particular, in the framework of the Black-Scholes model, the fair prices of two lookback options with infinite horizon are calculated. The main aim of the paper is to show that in each considered example the optimal stopping boundary satisfies the maximality principle and that the value function can be determined explicitly.
In this paper, we are concerned with the basic problem defined in [9]. Formulas for δV(0)and δ∞V(0),respectively the generalized and asymptotic gradient of the value function at zero, corresponding to an L2 -additive perturbation of dynamics are given. Under the normality condition, δV(0)turns out to be a compact subset of L2, formed entirely of arcs, and V is locally finite and Lipschitz at 0. Moreover, estimations of the generalized directional derivative and Dini's derivative of V at 0 are derived. Supplementary conditions imply that Dini's derivative of V at 0 exists, and V is actually strictly differentiate at this point.
In this study we examine repairable systems with random lifetime. Upon failure, a maintenance action, specifying the degree of repair, is taken by a controller. The objective is to determine an age-dependent maintenance strategy which minimizes the total expected discounted cost over an infinite planning horizon. Using several properties of the optimal policy which are derived in this study, we propose analytical and numerical methods for determining the optimal maintenance strategy. In order to obtain a better insight regarding the structure and nature of the optimal policy and to illustrate computational procedures, a numerical example is analysed. The proposed maintenance model outlines a new research channel in the area of reliability with interesting theoretical issues and a wide range of potential applications in various fields such as product design, inventory systems for spare parts, and management of maintenance crews.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.