# Chapter 2 Density estimation

This chapter is on nonparametric density estimation. A classical nonparametric estimator of a density is the histogram, which provides discontinuous and piecewise constant estimates. The focus in this chapter is on some of the alternatives that provide continuous or even smooth estimates instead.

*Kernel methods* form an important class of smooth density estimators as
implemented by the R function `density()`

. These estimators are essentially
just locally weighted averages, and their computation is relatively
straightforward in theory. In practice, different choices of how to implement
the computations can, however, have a big effect on the actual computation time,
and the implementation of kernel density estimators will illustrate three points:

- if possible, choose vectorized implementations in R,
- if a small loss in accuracy is acceptable, an approximate solution can be orders of magnitudes faster than a literal implementation,
- the time it takes to numerically evaluate different elementary functions can depend a lot on the function and how you implement the computation.

The first point is emphasized because it results in implementations that are short, expressive and easier to understand just as much as it typically results in computationally more efficient implementations. Note also that not every computation can be vectorized in a beneficial way, and one should never go through hoops to vectorize a computation.

Kernel methods rely on one or more *regularization parameters* that must be
selected to achieve the right balance of adapting
to data without adapting too much to the random variation in the data.
Choosing the right amount of regularization is just as important as choosing
the method to use in the first place. It may, in fact, be more important.
We actually do not have a complete implementation of a nonparametric estimator
until we have implemented a data driven and automatic way of choosing the
amount of regularization. Implementing only the computations for
evaluating a kernel estimator, say, and leaving it completely
to the user to choose the bandwidth is a job half done. Methods and implementations
for choosing the bandwidth are therefore treated in some detail in this chapter.

In the final section a likelihood analysis is carried out.
This is done to further clarify why regularized estimators are needed
to avoid overfitting to the data, and why there is in general no nonparametric
maximum likelihood estimator of a density. Regularization of the likelihood
can be achieved by constraining the density estimates to belong to a family
of increasingly flexible parametric densities that are fitted to data. This is
known as the *method of sieves*. Another approach is based on basis expansions,
but in either case, automatic selection of the amount regularization is just as
important as for kernel methods.