## 1.2 Monte Carlo methods

Broadly speaking, Monte Carlo methods are computations that rely on some form of random input in order to carry out a computation. The actual random input will be generated by a (pseudo)random number generator according to some distributional specifications, and the precise value of the computation will depend on the precise value of the random input but in a way where we understand the magnitude of the dependence quite well. In most cases we can make the dependence diminish by increasing the amount of random input.

In statistics it is quite interesting in itself that we can make the computer simulate data so that we can draw an example data set from a statistical model. However, the real usage of such simulations is almost always as a part of a Monte Carlo computation, where we repeat the simulation of data sets a large number of times and compute distributional properties of various statistics. This is just one of the most obvious applications of Monte Carlo methods in statistics and there are many others. Disregarding the specific computation of interest in applications, a core problem of Monte Carlo methods is efficient simulation of random variables from a given target distribution.

### 1.2.1 Univariate von Mises distributions

We will exemplify Monte Carlo computations by considering angle distributions just as in Section 1.1. The angles take values in the interval \((-\pi, \pi]\), and we will consider models based on the von Mises distribution on this interval, which has density

\[f(x) = \frac{1}{\varphi(\theta)} e^{\theta_1 \cos(x) + \theta_2 \sin(x)}\]

for \(\theta = (\theta_1, \theta_2)^T \in \mathbb{R}^2\). A common
alternative parametrization is obtained by introducing
\(\kappa = \|\theta\|_2 = \sqrt{\theta_1^2 + \theta_2^2}\), and
(whenever \(\kappa \neq 0\))
\(\nu = \theta / \kappa = (\cos(\mu), \sin(\mu))^T\) for \(\mu \in (-\pi, \pi]\).
Using the \((\kappa, \mu)\)-parametrization the density becomes

\[f(x) = \frac{1}{\varphi(\kappa \nu)} e^{\kappa \cos(x - \mu)}.\]
The former parametrization in terms of
\(\theta\) is, however, the canonical
parametrization of the family of distributions as an exponential family, which
is particularly useful for various likelihood estimation algorithms.
The normalization constant

\[\begin{align*} \varphi(\kappa \nu) & = \int_{-\pi}^\pi e^{\kappa \cos(x - \mu)}\mathrm{d} x \\ & = 2 \pi \int_{0}^{1} e^{\kappa \cos(\pi x)}\mathrm{d} x = 2 \pi I_0(\kappa) \end{align*}\]

is given in terms of the modified Bessel function
\(I_0\). We can easily compute and plot the density using R’s `besselI`

implementation of
the modified Bessel function.

```
phi <- function(k) 2 * pi * besselI(k, 0)
curve(exp(cos(x)) / phi(1), -pi, pi, lwd = 2, ylab = "density", ylim = c(0, 0.52))
curve(exp(2 * cos(x - 1)) / phi(2), col = "red", lwd = 2, add = TRUE)
curve(exp(0.5 * cos(x + 1.5)) / phi(0.5), col = "blue", lwd = 2, add = TRUE)
```

It is not entirely obvious how we should go about simulating data points
from the von Mises distribution. It will be demonstrated in Section
4.3 how to implement a *rejection sampler*, which is one
useful algorithm for simulating samples from a distribution with a density.

In this section we simply use the `rmovMF`

function from the `movMF`

package,
which implements a few functions for working with (finite mixtures of) von
Mises distributions, and even the general von Mises-Fisher distributions
that are generalizations of the von Mises distribution to \(p\)-dimensional
unit spheres.

```
library("movMF")
xy <- rmovMF(500, 0.5 * c(cos(-1.5), sin(-1.5)))
# rmovMF represents samples as elements on the unit circle
x <- acos(xy[, 1]) * sign(xy[, 2])
```

```
hist(x, breaks = seq(-pi, pi, length.out = 15), prob = TRUE)
rug(x)
density(x, bw = "SJ", cut = 0) %>% lines(col = "red", lwd = 2)
curve(exp(0.5 * cos(x + 1.5)) / phi(0.5), col = "blue", lwd = 2, add = TRUE)
```

### 1.2.2 Mixtures of von Mises distributions

The von Mises distributions are unimodal distributions on \((-\pi, \pi]\). Thus to
find a good model of the bimodal angle data, say, we have do move beyond these
distributions. A standard approach for constructing multimodal distributions
is as *mixtures* of unimodal distributions. A mixture of two von Mises distributions
can be constructed by flipping a (biased) coin to decide which of the two
distributions to sample from. We will use the exponential family parametrization
in the following.

```
thetaA <- c(3.5, -2)
thetaB <- c(-4.5, 4)
alpha <- 0.55 # Probability of von Mises distribution A
# The sample function implements the "coin flips"
u <- sample(c(1, 0), 500, replace = TRUE, prob = c(alpha, 1 - alpha))
xy <- rmovMF(500, thetaA) * u + rmovMF(500, thetaB) * (1 - u)
x <- acos(xy[, 1]) * sign(xy[, 2])
```

The `rmovMF`

actually implements simulation from a mixture distribution
directly, thus there is no need to construct the “coin flips” explicitly.

```
theta <- rbind(thetaA, thetaB)
xy <- rmovMF(length(x), theta, c(alpha, 1 - alpha))
x_alt <- acos(xy[, 1]) * sign(xy[, 2])
```

To compare the simulated data with two mixture components to the model and
a smoothed density, we implement an R function that computes the density
for an angle argument using the function `dmovMF`

that takes a unit circle
argument.

```
dvM <- function(x, theta, alpha) {
xx <- cbind(cos(x), sin(x))
dmovMF(xx, theta, c(alpha, 1 - alpha)) / (2 * pi)
}
```

Note that `dmovMF`

uses normalized spherical measure
on the unit circle as reference
measure, thus the need for the \(2\pi\) division if we want the result to be
comparable to histograms and density estimates that use Lebesgue measure on
\((-\pi, \pi]\) as the reference measure.

```
hist(x, breaks = seq(-pi, pi, length.out = 15), prob = TRUE, ylim = c(0, 0.5))
rug(x)
density(x, bw = "SJ", cut = 0) %>% lines(col = "red", lwd = 2)
curve(dvM(x, theta, alpha), col = "blue", lwd = 2, add = TRUE)
hist(x_alt, breaks = seq(-pi, pi, length.out = 15), prob = TRUE, ylim = c(0, 0.5))
rug(x_alt)
density(x_alt, bw = "SJ", cut = 0) %>% lines(col = "red", lwd = 2)
curve(dvM(x, theta, alpha), col = "blue", lwd = 2, add = TRUE)
```

Simulation of data from a distribution finds many applications. The technique is widely used whenever we want to investigate a statistical methodology in terms of its frequentistic performance under various data sampling models, and simulation is a tool of fundamental importance for the practical application of Bayesian statistical methods. Another important application is as a tool for computing approximations of integrals. This is usually called Monte Carlo integration and is a form of numerical integration. Computing probabilities or distribution functions, say, are notable examples of integrals, and we consider here the computation of the probability of the interval \((0, 1)\) for the above mixture of two von Mises distributions.

It is straightforward to compute this probability via Monte Carlo integration as a simple average. Note that we will use a large number of samples, 50,000 in this case, of simulated angles for this computation. Increasing the number even further will make the result more accurate. Chapter 5 deals with the assessment of the accuracy of Monte Carlo integrals, and how this random error can be estimated, bounded and minimized.

```
xy <- rmovMF(50000, theta, c(alpha, 1 - alpha))
x <- acos(xy[, 1]) * sign(xy[, 2])
mean(x > 0 & x < 1) # Estimate of the probability of the interval (0, 1)
```

`## [1] 0.08662`

The probability above could, of course, be expressed using the distribution function of the mixture of von Mises distributions, which in turn can be computed in terms of integrals of von Mises densities. Specifically, the probability is \[p = \frac{\alpha}{\varphi(\theta_A)} \int_0^1 e^{\theta_{A, 1} \cos(x) + \theta_{A, 2} \sin(x)} \mathrm{d} x + \frac{1 - \alpha}{\varphi(\theta_B)} \int_0^1 e^{\theta_{B, 1} \cos(x) + \theta_{B, 2} \sin(x)} \mathrm{d} x,\] but these integrals do not have a simple analytic representation – just as the distribution function of the von Mises distribution doesn’t have a simple analytic expression. Thus the computation of the probability requires numerical computation of the integrals.

The R function `integrate`

can be used for numerical integration of univariate
functions using standard numerical integration techniques. We can thus
compute the probability by integrating the density of the mixture,
as implemented above as the R function `dvM`

. Note the arguments passed
to `integrate`

below. The first argument is the density function, then follows
the lower and the upper limits of the integration, and then follows additional
arguments to the density – in this case parameter values.

`## 0.08635171 with absolute error < 9.6e-16`

The `integrate`

function in R is an interface to a couple of classical
QUADPACK Fortran routines for
numerical integration via adaptive quadrature.
Specifically, the computations rely on approximations of the form
\[\int_a^b f(x) \mathrm{d} x \simeq \sum_i w_i f(x_i)\]
for certain *grid points* \(x_i\) and weights \(w_i\), which are computed using
Gauss-Kronrod quadrature.
This method provides an estimate of the approximation error in addition to
the numerical approximation of the integral itself.

It is noteworthy that `integrate`

as a function implemented in R
takes another function, in this case the density `dvM`

, as an argument. R
is a functional programming language
and functions are first-class citizens.
This implies, for instance, that functions can be passed as arguments to other
functions using a variable name – just like any other variable can be passed
as an argument to a function. In the parlance of functional programming,
`integrate`

is a functional:
a higher-order function that takes a function as argument and returns a
numerical value. One of the themes of this book is how to
make good use of functional (and object oriented) programming features in R
to write clear, expressive and modular code without sacrificing computational
efficiency.

Returning to the specific problem of the computation of an integral, we may ask
what the purpose of Monte Carlo integration is? Apparently we can
just do numerical integration using e.g. `integrate`

. There are at least two
reasons why Monte Carlo integration is sometimes preferable. First, it is straightforward
to implement and often works quite well for multivariate and even high-dimensional
integrals, whereas grid-based numerical integration schemes scale poorly with
the dimension. Second, it does not require that we have an analytic representation
of the density. It is common in statistical applications that we are interested
in the distribution of a statistic, which is a complicated transformation of
data, and whose density is difficult or impossible to find analytically. Yet if
we can just simulate data, we can simulate from the distribution of the statistic,
and we can then use Monte Carlo integration to compute whatever
probability or integral w.r.t. the distribution of the statistic that we are
interested in.

### 1.2.3 Large scale Monte Carlo methods

Monte Carlo methods are used pervasively in statistics and in many other sciences today. It is nowadays trivial to simulate millions of data points from a simple univariate distribution like the von Mises distribution, and Monte Carlo methods are generally attractive because they allow us to solve computational problems approximately in many cases where exact or analytic computations are impossible and alternative deterministic numerical computations require more adaptation to the specific problem.

Monte Carlo methods are thus really good off-the-shelf methods, but scaling the methods up can be a challenge and is a contemporary research topic. For the simulation of multivariate \(p\)-dimensional data it can, for instance, make a big difference whether the algorithms scale linearly in \(p\) or like \(O(p^a)\) for some \(a > 1\). Likewise, some Monte Carlo methods (e.g. Bayesian computations) depend on a data set of size \(n\), and for large scale data sets, algorithms that scale like \(O(n)\) are really the only algorithms that can be used in practice.