## 6.3 Regression models

Regression models are models of one variable, called the response, conditionally on one or more other variables, called the predictors. Typically we use models that assume conditional independence of the responses given the predictors, and a particularly convenient class of regression models is based on exponential families.

If we let \(y_i \in \mathbb{R}\) denote the \(i\)th response and \(x_i \in \mathbb{R}^p\)
the \(i\)th predictor we can consider the exponential family of models with joint density
\[f(\mathbf{y} \mid \mathbf{X}, \beta) = \prod_{i=1}^n e^{\theta^T t_i(y_i \mid x_i) - \log \varphi_i(\theta)}\]
for suitably chosen base measures. Here
\[\mathbf{X} = \left( \begin{array}{c} x_1^T \\ x_2^T \\ \vdots \\ x_{n-1}^T \\ x_n^T \end{array} \right)\]
is called the *model matrix* and is an \(n \times p\) matrix.

**Example 6.3 **If \(y_i \in \mathbb{N}_0\) are counts we often use a Poisson regression model
with point probabilities (density w.r.t. counting measure)
\[p_i(y_i \mid x_i) = e^{-\mu(x_i)} \frac{\mu(x_i)^{y_i}}{y_i!}.\]
If the mean depends on the predictors in a log-linear way, \(\log(\mu(x_i)) = x_i^T \beta\),
then
\[p_i(y_i \mid x_i) = e^{\beta^T x_i y_i - \exp( x_i^T \beta)} \frac{1}{y_i!}.\]
The factor \(1/y_i!\) can be absorbed into the base measure, and we recognize
this Poisson regression model as an exponential family with sufficient statistics
\[t_i(y_i) = x_i y_i\]
and
\[\log \varphi_i(\beta) = \exp( x_i^T \beta).\]

To implement numerical optimization algorithms for computing the maximum-likelihood estimate we note that \[t(\mathbf{y}) = \sum_{i=1}^n x_i y_i = \mathbf{X}^T \mathbf{y} \quad \text{and} \quad \kappa(\beta) = \sum_{i=1}^n e^{x_i^T \beta},\] that \[\nabla \kappa(\beta) = \sum_{i=1}^n x_i e^{x_i^T \beta},\] and that \[D^2 \kappa(\beta) = \sum_{i=1}^n x_i x_i^T e^{x_i^T \beta} = \mathbf{X}^T \mathbf{W}(\beta) \mathbf{X}\] where \(\mathbf{W}(\beta)\) is a diagonal matrix with \(\mathbf{W}(\beta)_{ii} = \exp(x_i^T \beta).\) All these formulas follow directly from the general formulas for exponential families.

To illustrate the use of a Poisson regression model we consider the following data set from a major Swedish supermarket chain. It contains the number of bags of frozen vegetables sold in a given week in a given store under a marketing campaign. A predicted number of items sold in a normal week (without a campaign) based on historic sales is included. It is of interest to recalibrate this number to give a good prediction of the number of items actually sold.

```
## sale normalSale store
## Min. : 1.00 Min. : 0.20 Length:1066
## 1st Qu.: 12.00 1st Qu.: 4.20 Class :character
## Median : 21.00 Median : 7.25 Mode :character
## Mean : 40.29 Mean : 11.72
## 3rd Qu.: 40.00 3rd Qu.: 12.25
## Max. :571.00 Max. :102.00
```

It is natural to model the number of sold items using a Poisson regression model, and we will consider the following simple log-linear model:

\[\log(E(\text{sale} \mid \text{normalSale})) = \beta_0 + \beta_1 \log(\text{normalSale}).\]

This is an example of an exponential family regression model as above with
a Poisson response distribution. It is a standard regression model that can
be fitted to data using the `glm`

function and using the formula interface
to specify how the response should depend on the normal sale. The model is
fitted by computing the maximum-likelihood estimate of the two parameters
\(\beta_0\) and \(\beta_1\).

```
# A Poisson regression model
pois_model_null <- glm(
sale ~ log(normalSale),
data = vegetables,
family = poisson
)
```

```
##
## Call:
## glm(formula = sale ~ log(normalSale), family = poisson, data = vegetables)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.218 -2.677 -0.864 1.324 21.730
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.461 0.015 97.2 <2e-16 ***
## log(normalSale) 0.922 0.005 184.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 51177 on 1065 degrees of freedom
## Residual deviance: 17502 on 1064 degrees of freedom
## AIC: 22823
##
## Number of Fisher Scoring iterations: 5
```

The fitted model of the expected sale as a function of the normal sale, \(x\), is

\[x \mapsto e^{1.46 + 0.92 \times \log(x)} = (4.31 \times x^{-0.08}) \times x.\]
This model roughly predicts a four-fold increase of the sale during a
campaign, though the effect decays with increasing \(x\). If the normal
sale is 10 items the factor is \(4.31 \times 10^{-0.08} = 3.58\), and if
the normal sale is 100 items the factor reduces to \(4.31 \times 100^{-0.08} = 2.98\).

We are not entirely satisfied with this model. It is fitted across all stores in the data set, and it is not obvious that the same effect should apply across all stores. Thus we would like to fit a model of the form

\[\log(E(\text{sale})) = \beta_{\text{store}} + \beta_1 \log(\text{normalSale})\]

instead. Fortunately, this is also straightforward using `glm`

.

```
## Note, variable store is a factor! The 'intercept' is removed
## in the formula to obtain a parametrization as above.
pois_model <- glm(
sale ~ store + log(normalSale) - 1,
data = vegetables,
family = poisson
)
```

We will not print all the individual store effects as there are 352 individual stores.

```
## Estimate Std. Error z value Pr(>|z|)
## store1 2.7182 0.12756 21.309 9.416e-101
## store10 4.2954 0.12043 35.668 1.254e-278
## store100 3.4866 0.12426 28.060 3.055e-173
## store101 3.3007 0.11791 27.993 1.989e-172
## log(normalSale) 0.2025 0.03127 6.474 9.536e-11
```

We should note that the coefficient of \(\log(\text{normalSale})\) has changed considerably, and the model is a somewhat different model now for the individual stores.

From a computational viewpoint the most important thing that has changed is that the number of parameters has increased a lot. In the first and simple model there are two parameters. In the second model there are 353 parameters. Computing the maximum-likelihood estimate is a considerably more difficult problem in the second case.