Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Foundations of Smooth Optimization
- 3 Descent Methods
- 4 Gradient Methods Using Momentum
- 5 Stochastic Gradient
- 6 Coordinate Descent
- 7 First-Order Methods for Constrained Optimization
- 8 Nonsmooth Functions and Subgradients
- 9 Nonsmooth Optimization Methods
- 10 Duality and Algorithms
- 11 Differentiation and Adjoints
- Appendix
- Bibliography
- Index
11 - Differentiation and Adjoints
Published online by Cambridge University Press: 31 March 2022
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Foundations of Smooth Optimization
- 3 Descent Methods
- 4 Gradient Methods Using Momentum
- 5 Stochastic Gradient
- 6 Coordinate Descent
- 7 First-Order Methods for Constrained Optimization
- 8 Nonsmooth Functions and Subgradients
- 9 Nonsmooth Optimization Methods
- 10 Duality and Algorithms
- 11 Differentiation and Adjoints
- Appendix
- Bibliography
- Index
Summary
First derivatives (gradients) are needed for most of the algorithms described in the book. Here, we describe how these gradients can be computed efficiently for functions that have the form of arising in deep learning. The reverse mode of automatic differentiation, often called “back-propagation” in the machine learning community, is described for several problems with nested-composite and progressive structure that arises in neural network training. We provide another perspective on these techniques, based on a constrained optimization formulation and optimality conditions for this formulation.
- Type
- Chapter
- Information
- Optimization for Data Analysis , pp. 188 - 199Publisher: Cambridge University PressPrint publication year: 2022