Chapter 2 Density estimation

This chapter is on nonparametric density estimation. A classical nonparametric estimator of a density is the histogram, which provides discontinuous and piecewise constant estimates. The focus in this chapter is on some of the alternatives that provide continuous or even smooth estimates instead.

Kernel methods form an important class of smooth density estimators as implemented by the R function density(). These estimators are essentially just locally weighted averages, and their computation is relatively straightforward in theory. In practice, different choices of how to implement the computations can, however, have a big effect on the actual computation time, and the implementation of kernel density estimators will illustrate three points:

  • if possible, choose vectorized implementations in R,
  • if a small loss in accuracy is acceptable, an approximate solution can be orders of magnitudes faster than a literal implementation,
  • the time it takes to numerically evaluate different elementary functions can depend a lot on the function and how you implement the computation.

The first point is emphasized because it results in implementations that are short, expressive and easier to understand just as much as it typically results in computationally more efficient implementations. Note also that not every computation can be vectorized in a beneficial way, and one should never go through hoops to vectorize a computation.

Kernel methods rely on one or more regularization parameters that must be selected to achieve the right balance of adapting to data without adapting too much to the random variation in the data. Choosing the right amount of regularization is just as important as choosing the method to use in the first place. It may, in fact, be more important. We actually do not have a complete implementation of a nonparametric estimator until we have implemented a data driven and automatic way of choosing the amount of regularization. Implementing only the computations for evaluating a kernel estimator, say, and leaving it completely to the user to choose the bandwidth is a job half done. Methods and implementations for choosing the bandwidth are therefore treated in some detail in this chapter.

In the final section a likelihood analysis is carried out. This is done to further clarify why regularized estimators are needed to avoid overfitting to the data, and why there is in general no nonparametric maximum likelihood estimator of a density. Regularization of the likelihood can be achieved by constraining the density estimates to belong to a family of increasingly flexible parametric densities that are fitted to data. This is known as the method of sieves. Another approach is based on basis expansions, but in either case, automatic selection of the amount regularization is just as important as for kernel methods.