## 4.4 Adaptive envelopes

A good envelope should be tight, meaning that $$\alpha$$ is close to one, it should be fast to simulate from and have a density that is fast to evaluate. It is not obvious how to find such an envelope for an arbitrary target density $$f$$.

This section develops a general scheme for the construction of envelopes for all log-concave target densities. This is a special class of densities, but it is not uncommon in practice. The scheme can also be extended to work for some densities that have combinations of log-concave and log-convex behaviors. The same idea used for constructing envelopes can be used to bound $$f$$ from below. The accept-reject step can then avoid many evaluations of $$f$$, which is beneficial if $$f$$ is computationally expensive to evaluate.

The key idea of the scheme is to bound the log-density by piecewise affine functions. This is particularly easy to do if the density is log-concave. The scheme leads to analytically manageable formulas for the envelope, its corresponding distribution function and its inverse, and as a result it is fast to simulate proposals and compute the envelope as needed in the accept-reject step.

The scheme requires the choice of a finite number of points to determine the affine bounds. For any given choice of points the scheme adapts the envelope to the target density automatically. It is possible to implement a fully adaptive scheme that doesn’t even require the choice of points but initializes and updates the points dynamically as more and more rejection samples are computed. In this section the focus is on the scheme with a given and fixed number of points.

For a continuously differentiable, strictly positive and log-concave target on an open interval $$I \subseteq \mathbb{R}$$ it holds that $\log(f(x)) \leq \frac{f'(x_0)}{f(x_0)}(x - x_0) + \log(f(x_0))$ for any $$x, x_0 \in I$$.

Let $$x_1 < x_2 < \ldots < x_{m} \in I$$ and let $$I_1, \ldots, I_m \subseteq I$$ be intervals that form a partition of $$I$$ such that $$x_i \in I_i$$. Defining $a_i = (\log(f(x_i)))' = \frac{f'(x_i)}{f(x_i)} \quad \text{and} \quad b_i = \log(f(x_i)) - \alpha_i x_i$ we find the upper bound $\log(f(x)) \leq V(x) = \sum_{i=1}^m (a_i x + b_i) 1_{I_i}(x),$ or $f(x) \leq e^{V(x)}.$ Note that by the log-concavity of $$f$$, $$a_1 \geq a_2 \geq \ldots \geq a_m$$. The upper bound is integrable over $$I$$ if either $$a_1 > 0$$ and $$a_m < 0$$, or $$a_m < 0$$ and $$I$$ is bounded to the left, or $$a_1 > 0$$ and $$I$$ is bounded to the right. In any of these cases we define $c = \int_I e^{V(x)} \mathrm{d} x < \infty$ and $$g(x) = c^{-1} \exp(V(x))$$, and we find that with $$\alpha = c^{-1}$$ then $$\alpha f \leq g$$ and $$g$$ is an envelope of $$f$$. Note that it is actually not necessary to compute $$c$$ (or $$\alpha$$) to implement the rejection step in the rejection sampler, but that $$c$$ is needed for simulating from $$g$$ as described below. We will assume in the following that $$c < \infty$$.

The intervals $$I_i$$ have not been specified, and we could, in fact, implement rejection sampling with any choice of intervals fulfilling the conditions above. But in the interest of maximizing $$\alpha$$ (minimizing $$c$$) and thus minimizing the rejection frequency, we should choose $$I_i$$ so that $$a_i x + b_i$$ is minimal over $$I_i$$ among all the affine upper bounds. This will result in the tightest envelope. This means that for $$i = 1, \ldots, m - 1$$, $$I_i = (z_{i-1}, z_i]$$ with $$z_i$$ the point where $$a_i x + b_i$$ and $$a_{i+1} x + b_{i+1}$$ intersect. We find that the solution of $a_i x + b_i = a_{i+1} x + b_{i+1}$ is $z_i = \frac{b_{i+1} - b_i}{a_i - a_{i+1}}$ provided that $$a_{i+1} > a_i$$. The two extremes, $$z_0$$ and $$z_m$$, are chosen as the endpoints of $$I$$ and may be $$- \infty$$ and $$+ \infty$$, respectively.

One way to simulate from such envelopes is by transformation of uniform random variables by the inverse distribution function. It requires a little bookkeeping, but is otherwise straightforward. Define for $$x \in I_i$$ $F_i(x) = \int_{z_{i-1}}^x e^{a_i z + b_i} \mathrm{d} z,$ and let $$R_i = F_i(z_i)$$. Then $$c = \sum_{i=1}^m R_i$$, and if we define $$Q_i = \sum_{k=1}^{i} R_k$$ for $$i = 0, \ldots, m$$ the inverse of the distribution function in $$q$$ is given as the solution to the equation $F_i(x) = cq - Q_{i-1}, \qquad Q_{i-1} < cq \leq Q_{i}.$ That is, for a given $$q \in (0, 1)$$, first determine which interval $$(Q_{i-1}, Q_{i}]$$ that $$c q$$ falls into, and then solve the corresponding equation. Observe that when $$a_i \neq 0$$, $F_i(x) = \frac{1}{a_i}e^{b_i}\left(e^{a_i x} - e^{a_i z_{i-1}}\right).$

### 4.4.1 Beta distribution

To illustrate the envelope construction above for a simple log-concave density we consider the Beta distribution on $$(0, 1)$$ with shape parameters $$\geq 1$$. This distribution has density $f(x) \propto x^{\alpha - 1}(1-x)^{\beta - 1},$ which is log-concave (when the shape parameters are greater than one). We implement the rejection sampling algorithm for this density with the adaptive envelope using two points.

Betasim <- function(n, x1, x2, alpha, beta) {
lf <- function(x) (alpha - 1) * log(x) + (beta - 1) * log(1 - x)
lf_deriv <- function(x) (alpha - 1)/x - (beta - 1)/(1 - x)
a1 <- lf_deriv(x1)
a2 <- lf_deriv(x2)
if(a1 == 0 || a2 == 0 || a1 - a2 == 0)
stop("\nThe implementation requires a_1 and a_2 different
and both different from zero. Choose different values of x_1 and x_2.")
b1 <- lf(x1) - a1 * x1
b2 <- lf(x2) - a2 * x2
z1 <- (b2 - b1) / (a1 - a2)
Q1 <- exp(b1) * (exp(a1 * z1) - 1) / a1
c <- Q1 + exp(b2) * (exp(a2 * 1) - exp(a2 * z1)) / a2

y <- numeric(n)
uy <- rng_stream(n, runif)
u <- rng_stream(n, runif)
for(i in 1:n) {
reject <- TRUE
while(reject) {
u0 <- c * uy(n - i)
if(u0 < Q1) {
z <- log(a1 * exp(-b1) * u0 + 1) / a1
reject <- u(n - i) > exp(lf(z) - a1 * z - b1)
} else {
z <- log(a2 * exp(-b2) * (u0 - Q1) + exp(a2 * z1)) / a2
reject <- u(n - i) > exp(lf(z) - a2 * z - b2)
}
}
y[i] <- z
}
y
} Figure 4.6: Histograms of simulated variables from Beta distributions using the rejection sampler with the adaptive envelope based on log-concavity. The true density (blue) and the envelope (red) are added to the plots.

Note that as a safeguard we implemented a test on the $$a_i$$-s to check that the formulas used are actually meaningful, specifically that there are no divisions by zero.

Betasim(1, x1 = 0.25, x2 = 0.75, alpha = 4, beta = 2)  
## Error in Betasim(1, x1 = 0.25, x2 = 0.75, alpha = 4, beta = 2): ##
The implementation requires a_1 and a_2 different ## and both different
from zero. Choose different values of x_1 and x_2.

r
Betasim(1, x1 = 0.2, x2 = 0.75, alpha = 4, beta = 2)   
## Error in Betasim(1, x1 = 0.2, x2 = 0.75, alpha = 4, beta = 2): ##
The implementation requires a_1 and a_2 different ## and both different
from zero. Choose different values of x_1 and x_2.

r
Betasim(1, x1 = 0.2, x2 = 0.8, alpha = 4, beta = 2)    
##  0.761879

### 4.4.2 von Mises distribution

The von Mises rejection sampler in Section 4.3.1 used the uniform distribution as proposal distribution. As it turns out, the uniform density is not a particularly tight envelope. We illustrate this by studying the proportion of rejections for our previous implementation.

y <- vMsim(10000, 0.1, trace = TRUE)
y <- vMsim(10000, 0.5, trace = TRUE)
y <- vMsim(10000, 2, trace = TRUE)
y <- vMsim(10000, 5, trace = TRUE)
## kappa = 0.1 : 0.09156977
## kappa = 0.5 : 0.3561679
## kappa = 2 : 0.6932986
## kappa = 5 : 0.816507

The rejection frequency is high and increases with $$\kappa$$. For $$\kappa = 5$$ more than 80% of the proposals are rejected, and simulating $$n = 10,000$$ von Mises distributed variables thus requires the simulation of around $$50,000$$ variables from the proposal.

The von Mises density is, unfortunately, not log-concave on $$(-\pi, \pi)$$, but it is on $$(-\pi/2, \pi/2)$$. It is, furthermore, log-convex on $$(-\pi, -\pi/2)$$ as well as $$(\pi/2, \pi)$$, which implies that on these two intervals the log-density is below the corresponding chords. These chords can be pieced together with tangents to give an envelope.

vMsim_adapt <- function(n, x1, x2, kappa, trace = FALSE) {
lf <- function(x) kappa * cos(x)
lf_deriv <- function(x) - kappa * sin(x)
a1 <- 2 * kappa / pi
a2 <- lf_deriv(x1)
a3 <- lf_deriv(x2)
a4 <- - a1

b1 <- kappa
b2 <- lf(x1) - a2 * x1
b3 <- lf(x2) - a3 * x2
b4 <- kappa

z0 <- -pi
z1 <- -pi/2
z2 <- (b3 - b2) / (a2 - a3)
z3 <- pi/2
z4 <- pi

Q1 <- exp(b1) * (exp(a1 * z1) - exp(a1 * z0)) / a1
Q2 <- Q1 + exp(b2) * (exp(a2 * z2) - exp(a2 * z1)) / a2
Q3 <- Q2 + exp(b3) * (exp(a3 * z3) - exp(a3 * z2)) / a3
c <- Q3 + exp(b4) * (exp(a4 * z4) - exp(a4 * z3)) / a4

count <- 0
y <- numeric(n)
uy <- rng_stream(n, runif)
u <- rng_stream(n, runif)
for(i in 1:n) {
reject <- TRUE
while(reject) {
count <- count + 1
u0 <- c * uy(n - i)
if(u0 < Q1) {
z <- log(a1 * exp(-b1) * u0 + exp(a1 * z0)) / a1
reject <- u(n - i) > exp(lf(z) - a1 * z - b1)
} else if(u0 < Q2) {
z <- log(a2 * exp(-b2) * (u0 - Q1) + exp(a2 * z1)) / a2
reject <- u(n - i) > exp(lf(z) - a2 * z - b2)
} else if(u0 < Q3) {
z <- log(a3 * exp(-b3) * (u0 - Q2) + exp(a3 * z2)) / a3
reject <- u(n - i) > exp(lf(z) - a3 * z - b3)
} else {
z <- log(a4 * exp(-b4) * (u0 - Q3) + exp(a4 * z3)) / a4
reject <- u(n - i) > exp(lf(z) - a4 * z - b4)
}
}
y[i] <- z
}
if(trace)
cat("kappa =", kappa, ", x1 =", x1,
", x2 =", x2, ":", (count - n) / count, "\n")
y
} Figure 4.7: Histograms of simulated variables from von Mises distributions using the rejection sampler with the adaptive envelope based on a combination of log-concavity and log-convexity. The true density (blue) and the envelope (red) are added to the plots.

y <- vMsim_adapt(100000, -0.4, 0.4, 5, trace = TRUE)
y <- vMsim_adapt(100000, -1, 1, 2, trace = TRUE)
y <- vMsim_adapt(100000, -0.1, 0.1, 5, trace = TRUE)
y <- vMsim_adapt(100000, -0.4, 0.4, 2, trace = TRUE)
## kappa = 5 , x1 = -0.4 , x2 = 0.4 : 0.1996414
## kappa = 2 , x1 = -1 , x2 = 1 : 0.2412459
## kappa = 5 , x1 = -0.1 , x2 = 0.1 : 0.4844059
## kappa = 2 , x1 = -0.4 , x2 = 0.4 : 0.1557619

We see that compared to using the uniform density as envelope, these adaptive envelopes are generally tighter and leads to fewer rejections. Even tighter envelopes are possible by using more than four intervals, but it is, of course, always a good question how the added complexity and bookkeeping induced by using more advanced and adaptive envelopes affect run time. It is even a good question if our current adaptive implementation will outperform our first, and much simpler, implementation that used the uniform envelope.

microbenchmark(vMsim_adapt(100, -1, 1, 5),
vMsim_adapt(100, -0.4, 0.4, 5),
vMsim_adapt(100, -0.2, 0.2, 5),
vMsim_adapt(100, -0.1, 0.1, 5),
vMsim(100, 5),
vMsim_vec(100, 5)
)
## Unit: microseconds
##                            expr min  lq mean median  uq  max neval
##      vMsim_adapt(100, -1, 1, 5) 568 652  709    699 754  939   100
##  vMsim_adapt(100, -0.4, 0.4, 5) 285 322  354    346 369  520   100
##  vMsim_adapt(100, -0.2, 0.2, 5) 345 378  413    396 430  705   100
##  vMsim_adapt(100, -0.1, 0.1, 5) 411 474  527    500 549 1297   100
##                   vMsim(100, 5) 596 686  772    742 823 1559   100
##               vMsim_vec(100, 5)  61  76   91     83  96  243   100

The results from the benchmark show that the adaptive implementation has run time comparable to using the uniform proposal. With $$x_1 = -0.4$$ and $$x_2 = 0.4$$ and $$\kappa = 5$$ we found above that the rejection frequency was about 20% with the adaptive envelope, while it was about 80% when using the uniform envelope. A naive computation would thus suggest a speedup of a factor 4, but using the adaptive envelope there is actually only a speedup of a factor 2. And the vectorized solution is still considerably faster. A completely vectorized solution using the adaptive envelope is possible, but it is not entirely straightforward how to implement the more complicated envelope efficiently, and it may be a better option in this case to implement it using Rcpp.

Even if either implementation can be improved further in terms of run time, it is an important point when comparing algorithms that we don’t get too focused on surrogate performance quantities. The probability of rejection is a surrogate for actual run time, and it might be conceptually of interest to bring this probability down. But if it is at the expense of additional computations it might not be worth the effort in terms of real run time.