## 8.2 Exponential families

We consider in this section the special case where the model of $$\mathbf{y}$$ is given as an exponential family Bayesian network as in Section 6.1.2 and $$x = M(\mathbf{y})$$ is the observed transformation.

The complete data log-likelihood is $\theta \mapsto \theta^T t(\mathbf{y}) - \kappa(\theta) = \theta^T \sum_{j=1}^m t_j(y_j) - \kappa(\theta),$ and we find that $Q(\theta \mid \theta') = \theta^T \sum_{j=1}^m E_{\theta'}(t_j(Y_j) \mid X = x) - E_{\theta'}( \kappa(\theta) \mid X = x).$

To maximize $$Q$$ we differentiate $$Q$$ and equate the derivative equal to zero. We find that the resulting equation is $\sum_{j=1}^m E_{\theta'}(t_j(Y_j) \mid X = x) = E_{\theta'}( \nabla \kappa(\theta) \mid X = x).$

Alternatively, one may also note the following general equation for finding the maximum of $$Q(\cdot \mid \theta')$$ $\sum_{j=1}^m E_{\theta'}(t_j(Y_j) \mid X = x) = \sum_{j=1}^m E_{\theta'}(E_{\theta}(t_j(Y_j) \mid y_1, \ldots, y_{j-1}) \mid X = x),$ since $E_{\theta'}(\nabla \kappa(\theta)\mid X = x) = \sum_{j=1}^m E_{\theta'}(\nabla \log \varphi_j(\theta) \mid X = x) = \sum_{j=1}^m E_{\theta'}(E_{\theta}(t_j(Y_j) \mid y_1, \ldots, y_{j-1}) \mid X = x)$

Example 8.1 Continuing Example 6.4 with $$M$$ the projection map $(\mathbf{y}, \mathbf{z}) \mapsto \mathbf{y}$ we see that $$Q$$ is maximized in $$\theta$$ by solving $\sum_{i,j} E_{\theta'}(t(Y_{ij} \mid Z_i) \mid \mathbf{Y} = \mathbf{y}) = \sum_{i} m_i E_{\theta'}(\nabla \kappa(\theta \mid Z_i) \mid \mathbf{Y} = \mathbf{y}).$

By using Example 6.2 we see that $\kappa(\theta \mid Z_i) = \frac{(\theta_1 + \theta_3 Z_i)^2}{4\theta_2} - \frac{1}{2}\log \theta_2,$ hence

$\nabla \kappa(\theta \mid Z_i) = \frac{1}{2\theta_2} \left(\begin{array}{cc} \theta_1 + \theta_3 Z_i \\ - \frac{(\theta_1 + \theta_3 Z_i)^2}{2\theta_2} - 1 \\ \theta_1 Z_i + \theta_3 Z_i^2 \end{array}\right) = \left(\begin{array}{cc} \beta_0 + \nu Z_i \\ - (\beta_0 + \nu Z_i)^2 - \sigma^2 \\ \beta_0 Z_i + \nu Z_i^2 \end{array}\right).$

Therefore, $$Q$$ is maximized by solving the equation

$\sum_{i,j} \left(\begin{array}{cc} y_{ij} \\ - y_{ij}^2 \\ E_{\theta'}(Z_i \mid \mathbf{Y} = \mathbf{y}) y_{ij} \end{array}\right) = \sum_{i} m_i \left(\begin{array}{cc} \beta_0 + \nu E_{\theta'}(Z_i \mid \mathbf{Y}_i = \mathbf{y}_i) \\ - E_{\theta'}((\beta_0 + \nu Z_i)^2 \mid \mathbf{Y} = \mathbf{y}) - \sigma^2 \\ \beta_0 E_{\theta'}(Z_i \mid \mathbf{Y} = \mathbf{y}) + \nu E_{\theta'}(Z_i^2 \mid \mathbf{Y} = \mathbf{y}) \end{array}\right).$ Introducing first $$\xi_i = E_{\theta'}(Z_i \mid \mathbf{Y} = \mathbf{y})$$ and $$\zeta_i = E_{\theta'}(Z_i^2 \mid \mathbf{Y} = \mathbf{y})$$ we can rewrite the first and last of the three equations as the linear equation $\left(\begin{array}{cc} \sum_{i} m_i& \sum_{i} m_i\xi_i \\ \sum_{i} m_i\xi_i & \sum_{i} m_i\zeta_i \end{array}\right) \left(\begin{array}{c} \beta_0 \\ \nu \end{array}\right) = \left(\begin{array}{cc} \sum_{i,j} y_{ij} \\ \sum_{i,j} \xi_i y_{ij} \end{array}\right).$ Plugging the solution for $$\beta_0$$ and $$\nu$$ into the second equation we find $\sigma^2 = \frac{1}{\sum_{i} m_i}\left(\sum_{ij} y_{ij}^2 - \sum_{i} m_i(\beta_0^2 + \nu^2 \zeta_i + 2 \beta_0 \nu \xi_i)\right).$

This solves the M-step of the EM algorithm for the mixed effects model. What remains is the E-step that amounts to the computation of $$\xi_i$$ and $$\zeta_i$$. We know that the joint distribution of $$\mathbf{Y}$$ and $$\mathbf{Z}$$ is Gaussian, and we can easily compute the variances and covariances: $\mathrm{cov}(Z_i, Z_j) = \delta_{ij}$

$\mathrm{cov}(Y_{ij}, Y_{kl}) = \left\{ \begin{array}{ll} \nu^2 + \sigma^2 & \quad \text{if } i = k, j = l \\ \nu^2 & \quad \text{if } i = k, j \neq l \\ 0 & \quad \text{otherwise } \end{array} \right.$

$\mathrm{cov}(Z_i, Y_{kl}) = \left\{ \begin{array}{ll} \nu & \quad \text{if } i = k \\ 0 & \quad \text{otherwise } \end{array} \right.$

This gives a joint Gaussian distribution $\left( \begin{array}{c} \mathbf{Z} \\ \mathbf{Y} \end{array} \right) \sim \mathcal{N}\left( \left(\begin{array}{c} \mathbf{0} \\ \beta_0 \mathbf{1}\end{array} \right), \left(\begin{array}{cc} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{array}\right)\right).$

From this and the general formulas for computing conditional distributions in the multivariate Gaussian distribution: $\mathbf{Z} \mid \mathbf{Y} \sim \mathcal{N}\left( \Sigma_{12} \Sigma_{22}^{-1}(\mathbf{Y} - \beta_0 \mathbf{1}), \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} \right).$ The conditional means, $$\xi_i$$, are thus the coordinates of $$\Sigma_{12} \Sigma_{22}^{-1}(\mathbf{Y} - \beta_0 \mathbf{1})$$. The conditional second moments, $$\zeta_i$$, can be found as the diagonal elements of the conditional covariance matrix plus $$\xi_i^2$$.