3.8 Exercises

Nearest neighbors

Kernel estimators

Exercise 3.1 Consider a bivariate data set \((x_1, y_1), \ldots, (x_n, y_n)\) and let \(K\) be a probability density with mean 0. Then \[\hat{f}(x, y) = \frac{1}{n h^2} \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right) K\left(\frac{y - y_i}{h}\right)\] is a bivariate kernel density estimator of the joint density of \(x\) and \(y\). Show that the kernel density estimator \[\hat{f}_1(x) = \frac{1}{n h} \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right)\] is also the marginal distribution of \(x\) under \(\hat{f}\), and that the Nadaraya-Watson kernel smoother is the conditional expectation of \(y\) given \(x\) under \(\hat{f}\).

Exercise 3.2 Suppose that \(K\) is a symmetric kernel and the \(x\)-s are equidistant. Implement a function that computes the smoother matrix using the toeplitz function and \(O(n)\) kernel evaluations where \(n\) is the number of data points. Implement also a function that computes the diagonal elements of the smoother matrix directly with run time \(O(n)\). Hint: find inspiration in the implementation of the running mean.