## 3.6 Gaussian processes

Suppose that $$X = X_{1:n} \sim \mathcal{N}(\xi_x, \Sigma_{x})$$ with $\mathrm{cov}(X_i, X_j) = K(t_i - t_j)$ for a kernel function $$K$$.

With the observation equation $$Y_i = X_i + \delta_i$$ for $$\delta = \delta_{1:n} \sim \mathcal{N}(0, \Omega)$$ and $$\delta \perp \! \! \perp X$$ we get

$(X, Y) \sim \mathcal{N}\left(\left(\begin{array}{c} \xi_x \\ \xi_x \end{array}\right), \left(\begin{array}{cc} \Sigma_x & \Sigma_x \\ \Sigma_x & \Sigma_x + \Omega \end{array} \right) \right).$

Hence $E(X \mid Y) = \xi_x + \Sigma_x (\Sigma_x + \Omega)^{-1} (Y - \xi_x).$

Assuming that $$\xi_x = 0$$ the conditional expectation is a linear smoother with smoother matrix $S = \Sigma_x (\Sigma_x + \Omega)^{-1}.$

This is also true if $$\Sigma_x (\Sigma_x + \Omega)^{-1} \xi_x = \xi_x$$. If this identity holds approximately, we can argue that for computing $$E(X \mid Y)$$ we don’t need to know $$\xi_x$$.

If the observation variance is $$\Omega = \sigma^2 I$$ then the smoother matrix is $\Sigma_x (\Sigma_x + \sigma^2 I)^{-1} = (I + \sigma^2 \Sigma_x^{-1})^{-1}.$