3.6 Gaussian processes

Suppose that \(X = X_{1:n} \sim \mathcal{N}(\xi_x, \Sigma_{x})\) with \[\mathrm{cov}(X_i, X_j) = K(t_i - t_j)\] for a kernel function \(K\).

With the observation equation \(Y_i = X_i + \delta_i\) for \(\delta = \delta_{1:n} \sim \mathcal{N}(0, \Omega)\) and \(\delta \perp \! \! \perp X\) we get

\[(X, Y) \sim \mathcal{N}\left(\left(\begin{array}{c} \xi_x \\ \xi_x \end{array}\right), \left(\begin{array}{cc} \Sigma_x & \Sigma_x \\ \Sigma_x & \Sigma_x + \Omega \end{array} \right) \right).\]

Hence \[E(X \mid Y) = \xi_x + \Sigma_x (\Sigma_x + \Omega)^{-1} (Y - \xi_x).\]

Assuming that \(\xi_x = 0\) the conditional expectation is a linear smoother with smoother matrix \[S = \Sigma_x (\Sigma_x + \Omega)^{-1}.\]

This is also true if \(\Sigma_x (\Sigma_x + \Omega)^{-1} \xi_x = \xi_x\). If this identity holds approximately, we can argue that for computing \(E(X \mid Y)\) we don’t need to know \(\xi_x\).

If the observation variance is \(\Omega = \sigma^2 I\) then the smoother matrix is \[\Sigma_x (\Sigma_x + \sigma^2 I)^{-1} = (I + \sigma^2 \Sigma_x^{-1})^{-1}.\]