Chapter 9 Stochastic Optimization

Numerical optimization involves different tradeoffs such as an exploration-exploitation tradeoff. On the one hand, the objective function must be thoroughly explored to build an adequate model of it. On the other hand, the model should be exploited so as to find the minimum quickly. Another tradeoff is between the accuracy of the model and the time it takes to compute with it.

The optimization algorithms considered in Chapters @ref{numopt} and @ref{em} work on all available data and take deterministic steps in each iteration. The models are based on accurate local computations of derivatives that can be demanding to compute for large data sets. Moreover, the algorithms greedily exploit the local model obtained from derivatives, but they do little exploration.

By including randomness into optimization algorithms it is possible to lower the computational costs and make the algorithms more exploratory. This can be done in various ways. Classical examples of stochastic optimization algorithms are simulated annealing and evolutionary algorithms that incorporate randomness into the iterative steps with the purpose of exploring the objective function better than a deterministic algorithm is able to. In particular, to avoid getting stuck in saddle points and to escape local minimum. Stochastic gradient descent is another example, which compute descent directions from approximate gradients using random subsets of the data.

The literature on stochastic optimization is huge, and this chapter will only cover a few examples of particular relevance to statistics. Stochastic gradient descent has recently become the standard solution for large scale optimization. In situations ..