Search

Valuing life
Paul Slovic
Journal:

Behavioural Public Policy , First View

Published online by Cambridge University Press:

31 October 2024, pp. 1-9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The heart of prospect theory is the value function, proposing that the carriers of value are positive or negative changes from a reference point. Daniel Kahneman observed that if prospect theory had a flag, the value function would be drawn on it. The function is nonlinear, reflecting diminishing sensitivity to magnitude. When describing how human lives are valued, the function exposes profound incoherence. An individual life is highly valued and thus vigorously protected if it is the only life at risk. But that life loses its value when it is one of many endangered by a larger tragedy. Beyond this insensitivity, the function may actually decline when many lives at risk become mere numbers. The more who die, the less we care. Implications of this deadly ‘arithmetic of compassion’ for understanding and managing the risk from nuclear weapons are briefly discussed.

10 - Reinforcement Learning
from Part II - Cognitive Modeling Paradigms
- By Kenji Doya
Edited by Ron Sun, Rensselaer Polytechnic Institute, New York
Book:

The Cambridge Handbook of Computational Cognitive Sciences

Published online:

21 April 2023

Print publication:

11 May 2023, pp 350-370
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Reinforcement learning (RL) is a computational framework for an active agent to learn behaviors on the basis of a scalar reward feedback. The theory of reinforcement learning was developed in the artificial intelligence community with intuitions from psychology and animal learning theory and mathematical basis in control theory. It has been successfully applied to tasks like game playing and robot control. Reinforcement learning gives a theoretical account of behavioral learning in humans and animals and underlying brain mechanisms, such as dopamine signaling and the basal ganglia circuit. Reinforcement learning serves as the “common language” for engineers, biologists, and cognitive scientists to exchange their problems and findings in goal-directed behaviors. This chapter introduces the basic theoretical framework of reinforcement learning and reviews its impacts in artificial intelligence, neuroscience, and cognitive science.

APPROXIMATING THE VALUE FUNCTION FOR OPTIMAL EXPERIMENTATION
Hans M. Amman, David A. Kendrick, Marco P. Tucci
Journal:

Macroeconomic Dynamics / Volume 24 / Issue 5 / July 2020

Published online by Cambridge University Press:

14 November 2018, pp. 1073-1086
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the economics literature, there are two dominant approaches for solving models with optimal experimentation (also called active learning). The first approach is based on the value function and the second on an approximation method. In principle the value function approach is the preferred method. However, it suffers from the curse of dimensionality and is only applicable to small problems with a limited number of policy variables. The approximation method allows for a computationally larger class of models, but may produce results that deviate from the optimal solution. Our simulations indicate that when the effects of learning are limited, the differences may be small. However, when there is sufficient scope for learning, the value function solution seems more aggressive in the use of the policy variable.

Stochastic impulse control of exchange rates with Freidlin–Wentzell perturbations
Part of
Gregory Gagnon
Journal:

Journal of Applied Probability / Volume 54 / Issue 1 / March 2017

Published online by Cambridge University Press:

04 April 2017, pp. 23-41

Print publication:

March 2017
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper pioneers a Freidlin–Wentzell approach to stochastic impulse control of exchange rates when the central bank desires to maintain a target zone. Pressure to stimulate the economy forces the bank to implement diffusion monetary policy involving Freidlin–Wentzell perturbations indexed by a parameter ε∈ [0,1]. If ε=0, the policy keeps exchange rates in the target zone for all times t≥0. When ε>0, exchange rates continually exit the target zone almost surely, triggering central bank interventions which force currencies back into the zone or abandonment of all targets. Interventions and target zone deviations are costly, motivating the bank to minimize these joint costs for any ε∈ [0,1]. We prove convergence of the value functions as ε→0 achieving a value function approximation for small ε. Via sample path analysis and cost function bounds, intervention followed by target zone abandonment emerges as the optimal policy.

Hölder equivalence of the value functionfor control-affine systems ∗
Dario Prandi
Journal:

ESAIM: Control, Optimisation and Calculus of Variations / Volume 20 / Issue 4 / October 2014

Published online by Cambridge University Press:

08 August 2014, pp. 1224-1248

Print publication:

October 2014
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We prove the continuity and the Hölder equivalence w.r.t. an Euclidean distance of the value function associated with the L1 cost of the control-affine system q̇ = f0(q) + ∑j=1m uj fj(q), satisfying the strong Hörmander condition. This is done by proving a result in the same spirit as the Ball–Box theorem for driftless (or sub-Riemannian) systems. The techniques used are based on a reduction of the control-affine system to a linear but time-dependent one, for which we are able to define a generalization of the nilpotent approximation and through which we derive estimates for the shape of the reachable sets. Finally, we also prove the continuity of the value function associated with the L1 cost of time-dependent systems of the form q̇ = ∑j=1m uj fjt(q).

On the Value Function of the M/G/1 FCFS and LCFS Queues
Part of
- Special processes
Esa Hyytiä, Samuli Aalto, Aleksi Penttinen, Jorma Virtamo
Journal:

Journal of Applied Probability / Volume 49 / Issue 4 / December 2012

Published online by Cambridge University Press:

30 January 2018, pp. 1052-1071

Print publication:

December 2012
- Article
- - You have access
- PDF
- Export citation
We consider a single-server queue with Poisson input operating under first-come–first-served (FCFS) or last-come–first-served (LCFS) disciplines. The service times of the customers are independent and obey a general distribution. The system is subject to costs for holding a customer per unit of time, which can be customer specific or customer class specific. We give general expressions for the corresponding value functions, which have elementary compact forms, similar to the Pollaczek–Khinchine mean value formulae. The results generalize earlier work where similar expressions have been obtained for specific service time distributions. The obtained value functions can be readily applied to develop nearly optimal dispatching policies for a broad range of systems with parallel queues, including multiclass scenarios and the cases where service time estimates are available.

A CLOSED-FORM SOLUTION TO A MODEL OF TWO-SIDED, PARTIAL ALTRUISM
Ana Fernandes
Journal:

Macroeconomic Dynamics / Volume 16 / Issue 2 / April 2012

Published online by Cambridge University Press:

26 March 2012, pp. 230-239
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a closed-form characterization of the allocation of resources in an overlapping generations model of two-sided, partial altruism. Three assumptions are made: (i) parents and children play Markov strategies, (ii) utility takes the CRRA form, and (iii) the income of children is stochastic but proportional to the saving of parents. In families where children are rich relative to their parents, saving rates—measured as a function of the family's total resources—are higher than when children are poor relative to their parents. Income redistribution from the old to the young, therefore, leads to an increase in aggregate saving.

Construction of the Value Function and Optimal Rules in Optimal Stopping of One-Dimensional Diffusions
Part of
- Stochastic processes
- Markov processes
Kurt Helmes, Richard H. Stockbridge
Journal:

Advances in Applied Probability / Volume 42 / Issue 1 / March 2010

Published online by Cambridge University Press:

01 July 2016, pp. 158-182

Print publication:

March 2010
- Article
- - You have access
- PDF
- Export citation
A new approach to the solution of optimal stopping problems for one-dimensional diffusions is developed. It arises by imbedding the stochastic problem in a linear programming problem over a space of measures. Optimizing over a smaller class of stopping rules provides a lower bound on the value of the original problem. Then the weak duality of a restricted form of the dual linear program provides an upper bound on the value. An explicit formula for the reward earned using a two-point hitting time stopping rule allows us to prove strong duality between these problems and, therefore, allows us to either optimize over these simpler stopping rules or to solve the restricted dual program. Each optimization problem is parameterized by the initial value of the diffusion and, thus, we are able to construct the value function by solving the family of optimization problems. This methodology requires little regularity of the terminal reward function. When the reward function is smooth, the optimal stopping locations are shown to satisfy the smooth pasting principle. The procedure is illustrated using two examples.

A CORRESPONDENCE-THEORETIC APPROACH TO DYNAMIC OPTIMIZATION
C. D. Aliprantis, G. Camera
Journal:

Macroeconomic Dynamics / Volume 13 / Issue S1 / May 2009

Published online by Cambridge University Press:

01 May 2009, pp. 97-117
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper introduces a method of optimization in infinite-horizon economies based on the theory of correspondences. The proposed approach allow us to study time-separable and non-time-separable dynamic economic models without resorting to fixed point theorems or transversality conditions. When our technique is applied to the standard time-separable model it provides an alternative and straightforward way to derive the common recursive formulation of these models by means of Bellman equations.

A variational approach to implicit ODEs and differential inclusions
Sergio Amat, Pablo Pedregal
Journal:

ESAIM: Control, Optimisation and Calculus of Variations / Volume 15 / Issue 1 / January 2009

Published online by Cambridge University Press:

23 January 2009, pp. 139-148

Print publication:

January 2009
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
An alternative approach for the analysis and the numericalapproximation of ODEs, using a variational framework, ispresented. It is based on the natural and elementary idea of minimizing the residual of the differential equation measured in a usual Lp norm.Typical existence results for Cauchy problems can thus berecovered, and finer sets of assumptions for existence are made explicit. We treat, in particular, the cases of an explicit ODE and a differential inclusion. This approach also allows for a whole strategy to approximate numerically the solution. It is briefly indicated here as it will be pursued systematically and in a much more broad fashion in a subsequent paper.

On the value function of the M/Cox(r)/1 queue
Part of
- Special processes
- Stochastic systems and control
Sandjai Bhulai
Journal:

Journal of Applied Probability / Volume 43 / Issue 2 / June 2006

Published online by Cambridge University Press:

14 July 2016, pp. 363-376

Print publication:

June 2006
- Article
- - You have access
- PDF
- Export citation
We consider a single-server queueing system at which customers arrive according to a Poisson process. The service times of the customers are independent and follow a Coxian distribution of order r. The system is subject to costs per unit time for holding a customer in the system. We give a closed-form expression for the average cost and the corresponding value function. The result can be used to derive nearly optimal policies in controlled queueing systems in which the service times are not necessarily Markovian, by performing a single step of policy iteration. We illustrate this in the model where a controller has to route to several single-server queues. Numerical experiments show that the improved policy has a close-to-optimal value.

Regularity along optimaltrajectories of the value function of a Mayer problem
Carlo Sinestrari
Journal:

ESAIM: Control, Optimisation and Calculus of Variations / Volume 10 / Issue 4 / October 2004

Published online by Cambridge University Press:

15 October 2004, pp. 666-676

Print publication:

October 2004
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We consider an optimal control problem of Mayer type and prove that,under suitable conditions on the system, the value function isdifferentiable along optimal trajectories, except possibly at theendpoints. We provide counterexamples to show that this property may failto hold if some of our conditions are violated. We then apply our regularityresult to derive optimality conditions for the trajectories of the system.

A set oriented approach to global optimal control
Oliver Junge, Hinke M. Osinga
Journal:

ESAIM: Control, Optimisation and Calculus of Variations / Volume 10 / Issue 2 / April 2004

Published online by Cambridge University Press:

15 March 2004, pp. 259-270

Print publication:

April 2004
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We describe an algorithm for computing the value function for “all source, single destination” discrete-time nonlinear optimal control problems together with approximations of associated globally optimal control strategies. The method is based on a set oriented approach for the discretization of the problem in combination with graph-theoretic techniques. The central idea is that a discretization of phase space of the given problem leads to an (all source, single destination) shortest path problem on a finite graph. The method is illustrated by two numerical examples, namely a single pendulum on a cart and a parametrically driven inverted double pendulum.

On the structure of value functions for threshold policies in queueing models
Part of
- Special processes
- Stochastic systems and control
Sandjai Bhulai, Ger Koole
Journal:

Journal of Applied Probability / Volume 40 / Issue 3 / September 2003

Published online by Cambridge University Press:

14 July 2016, pp. 613-622

Print publication:

September 2003
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We study the multiserver queue with Poisson arrivals and identical independent servers with exponentially distributed service times. Customers arriving at the system are admitted or rejected according to a fixed threshold policy. Moreover, the system is subject to holding, waiting, and rejection costs. We give a closed-form expression for the average costs and the value function for this multiserver queue. The result will then be used in a single step of policy iteration in the model where a controller has to route to several finite-buffer queues with multiple servers. We numerically show that the improved policy has a close to optimal value.

Discounted optimal stopping problems for the maximum process
Part of
- Stochastic analysis
- Stochastic processes
Jesper Lund Pedersen
Journal:

Journal of Applied Probability / Volume 37 / Issue 4 / December 2000

Published online by Cambridge University Press:

14 July 2016, pp. 972-983

Print publication:

December 2000
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The maximality principle [6] is shown to be valid in some examples of discounted optimal stopping problems for the maximum process. In each of these examples explicit formulas for the value functions are derived and the optimal stopping times are displayed. In particular, in the framework of the Black-Scholes model, the fair prices of two lookback options with infinite horizon are calculated. The main aim of the paper is to show that in each considered example the optimal stopping boundary satisfies the maximality principle and that the value function can be determined explicitly.

Sensitivity and Controllability of Systems Governed by Integral Equations Via Proximal Analysis
A. Yezza
Journal:

Canadian Journal of Mathematics / Volume 45 / Issue 5 / 01 October 1993

Published online by Cambridge University Press:

20 November 2018, pp. 1104-1120

Print publication:

01 October 1993
- Article
- - You have access
- PDF
- Export citation
In this paper, we are concerned with the basic problem defined in [9]. Formulas for δV(0)and δ∞V(0),respectively the generalized and asymptotic gradient of the value function at zero, corresponding to an L2 -additive perturbation of dynamics are given. Under the normality condition, δV(0)turns out to be a compact subset of L2, formed entirely of arcs, and V is locally finite and Lipschitz at 0. Moreover, estimations of the generalized directional derivative and Dini's derivative of V at 0 are derived. Supplementary conditions imply that Dini's derivative of V at 0 exists, and V is actually strictly differentiate at this point.

Optimal maintenance strategies for repairable systems with general degree of repair
Wolfgang Stadje, Dror Zuckerman
Journal:

Journal of Applied Probability / Volume 28 / Issue 2 / June 1991

Published online by Cambridge University Press:

14 July 2016, pp. 384-396

Print publication:

June 1991
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this study we examine repairable systems with random lifetime. Upon failure, a maintenance action, specifying the degree of repair, is taken by a controller. The objective is to determine an age-dependent maintenance strategy which minimizes the total expected discounted cost over an infinite planning horizon. Using several properties of the optimal policy which are derived in this study, we propose analytical and numerical methods for determining the optimal maintenance strategy. In order to obtain a better insight regarding the structure and nature of the optimal policy and to illustrate computational procedures, a numerical example is analysed. The proposed maintenance model outlines a new research channel in the area of reliability with interesting theoretical issues and a wide range of potential applications in various fields such as product design, inventory systems for spare parts, and management of maintenance crews.

Search Results

Refine search

Refine search

Actions for selected content:

17 results

Valuing life

10 - Reinforcement Learning

Summary

APPROXIMATING THE VALUE FUNCTION FOR OPTIMAL EXPERIMENTATION

Stochastic impulse control of exchange rates with Freidlin–Wentzell perturbations

Hölder equivalence of the value functionfor control-affine systems ∗

On the Value Function of the M/G/1 FCFS and LCFS Queues

A CLOSED-FORM SOLUTION TO A MODEL OF TWO-SIDED, PARTIAL ALTRUISM

Construction of the Value Function and Optimal Rules in Optimal Stopping of One-Dimensional Diffusions

A CORRESPONDENCE-THEORETIC APPROACH TO DYNAMIC OPTIMIZATION

A variational approach to implicit ODEs and differential inclusions

On the value function of the M/Cox(r)/1 queue

Regularity along optimaltrajectories of the value function of a Mayer problem

A set oriented approach to global optimal control

On the structure of value functions for threshold policies in queueing models

Discounted optimal stopping problems for the maximum process

Sensitivity and Controllability of Systems Governed by Integral Equations Via Proximal Analysis

Optimal maintenance strategies for repairable systems with general degree of repair

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

17 results

Summary