10/04/2024

Student t tests

Penguins

Penguins from the Palmer Archipelago in Antartica, see palmerpenguins.

library(palmerpenguins)
kable(head(penguins))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Subset

We keep only the penguins of species “Adelie”, and remove missing data.

adelie <- subset(penguins, species == "Adelie")
adelie <- na.omit(adelie)
kable(head(adelie))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007

Group Comparison

Question: Are male adelie penguins heavier than female adelie penguins ?

par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1)
plot(adelie$body_mass_g, col = adelie$sex)      ## scatter plot
boxplot(body_mass_g ~ sex, data = adelie)       ## box plot

Simple \(t\) test

mass_females <- subset(adelie, sex == "female")$body_mass_g
mass_males <- subset(adelie, sex == "male")$body_mass_g
t.test(mass_males, mass_females, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  mass_males and mass_females
## t = 13.126, df = 144, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  573.0666 776.2484
## sample estimates:
## mean of x mean of y 
##  4043.493  3368.836

Simple \(t\) test

Data:

  • \(n_f\) female adelie penguins, \(n_m\) male adelie penguins
  • \(y^f_{i}\): mass (g) of a female adelie penguin, \(1 \leq i \leq n_f\)
  • \(y^m_{j}\): mass (g) of a male adelie penguin, \(1 \leq j \leq n_m\)

Assumption:

  • \(y^f_{i} \sim \mathcal{N}(\mu_f, \sigma^2)\) iid
  • \(y^m_{j} \sim \mathcal{N}(\mu_m, \sigma^2)\) iid

Test:

\(\mathcal{H}_0\): \(\mu_f = \mu_m\) vs \(\mathcal{H}_1\): \(\mu_f \neq \mu_m\)

Simple \(t\) test

Test:

\(\mathcal{H}_0\): \(\mu_f = \mu_m\) vs \(\mathcal{H}_1\): \(\mu_f \neq \mu_m\)

Test Statistic:

\[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \] with the “pooled” standard deviation: \[ s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]

Simple \(t\) test

Simple \(t\) test

t.test(mass_males, mass_females, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  mass_males and mass_females
## t = 13.126, df = 144, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  573.0666 776.2484
## sample estimates:
## mean of x mean of y 
##  4043.493  3368.836

Reject the null that \(\mu_f = \mu_m\) : males and females have different body masses.

Linear Regression

Model

Assumptions:

  • \(y^f_{i} \sim \mathcal{N}(\mu_f, \sigma^2)\) iid
  • \(y^m_{j} \sim \mathcal{N}(\mu_m, \sigma^2)\) iid

in other words:

  • \(y^f_{i} = \mu_f + \epsilon_i^f\), with \(\epsilon^f_i \sim \mathcal{N}(0, \sigma^2)\) iid
  • \(y^m_{j} = \mu_m + \epsilon_j^m\), with \(\epsilon^m_j \sim \mathcal{N}(0, \sigma^2)\) iid

Can we write this as a linear model: “\(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}\)” ?

Model

Write: \[ \mathbf{y} = \begin{pmatrix} \mathbf{y}^f \\ \mathbf{y}^m \\ \end{pmatrix} \quad \text{and} \quad \boldsymbol{\epsilon} = \begin{pmatrix} \boldsymbol{\epsilon}^f \\ \boldsymbol{\epsilon}^m \\ \end{pmatrix} \] \(\mathbf{y}\) and \(\boldsymbol{\epsilon}\) vectors of size \(n = n_f + n_m\).

Then, for \(1 \leq i \leq n\): \[ y_i = \mu_k + \epsilon_i \quad \text{with} \quad \begin{cases} \mu_k = \mu_f & \text{if } 1 \leq i \leq n_f\\ \mu_k = \mu_m & \text{if } n_f+1 \leq i \leq n_f + n_m \end{cases} \]

Regression matrix \(\mathbf{X}\) ?

Linear Model

\[ \text{Write:} \qquad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 0 & 1 \\ \vdots & \vdots\\ 0 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m \\ \end{pmatrix} \]

\[ \text{Then:} \qquad \mathbf{X}\boldsymbol{\beta} = \begin{pmatrix} \mu_f + 0\\ \vdots \\ \mu_f + 0\\ 0 + \mu_m\\ \vdots\\ 0 + \mu_m \\ \end{pmatrix} = \begin{pmatrix} \mu_f \\ \vdots \\ \mu_f \\ \mu_m \\ \vdots\\ \mu_m \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \]

Linear Model

Hence, the model can be written as: \[ \begin{pmatrix} y^f_1 \\ \vdots\\ y^f_{n_f} \\ y^m_{1} \\ \vdots\\ y^m_{n_m} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 0 & 1 \\ \vdots & \vdots\\ 0 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_f \\ \mu_m \\ \end{pmatrix} + \begin{pmatrix} \epsilon^f_1 \\ \vdots\\ \epsilon^f_{n_f} \\ \epsilon^m_{1} \\ \vdots\\ \epsilon^m_{n_m} \\ \end{pmatrix} \]

i.e: \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]

\(\to\) classical linear model with Gaussian assumption !

Linear Model - With Intercept

\[ \text{Write:} \qquad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \]

\[ \text{Then:} \qquad \mathbf{X}\boldsymbol{\beta} = \begin{pmatrix} \mu_f + 0\\ \vdots \\ \mu_f + 0\\ \mu_f + \mu_m - \mu_f\\ \vdots\\ \mu_f + \mu_m - \mu_f \\ \end{pmatrix} = \begin{pmatrix} \mu_f \\ \vdots \\ \mu_f \\ \mu_m \\ \vdots\\ \mu_m \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \]

Linear Model - With Intercept

The model can also be written as: \[ \begin{pmatrix} y^f_1 \\ \vdots\\ y^f_{n_f} \\ y^m_{1} \\ \vdots\\ y^m_{n_m} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} + \begin{pmatrix} \epsilon^f_1 \\ \vdots\\ \epsilon^f_{n_f} \\ \epsilon^m_{1} \\ \vdots\\ \epsilon^m_{n_m} \\ \end{pmatrix} \]

i.e: \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]

\(\to\) parametrization used in R

Linear Model - \(t\) test

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]

As \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix}, \]

the test \[ \mathcal{H}_0: \mu_f = \mu_m \quad \text{vs} \quad \mathcal{H}_1: \mu_f \neq \mu_m \]

becomes: \[ \mathcal{H}_0: \beta_2 = 0 \quad \text{vs} \quad \mathcal{H}_1: \beta_2 \neq 0 \]

\(\to\) \(t\)-test on the \(\beta_2\) coefficient !

Linear Model

fit <- lm(body_mass_g ~ sex, data = adelie)
model.matrix(fit)[1:10, ]
##    (Intercept) sexmale
## 1            1       1
## 2            1       0
## 3            1       0
## 4            1       0
## 5            1       1
## 6            1       0
## 7            1       1
## 8            1       0
## 9            1       1
## 10           1       1

Linear Model

summary(fit)
## 
## Call:
## lm(formula = body_mass_g ~ sex, data = adelie)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -718.49 -218.84  -18.84  225.00  731.51 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3368.84      36.34   92.69   <2e-16 ***
## sexmale       674.66      51.40   13.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 310.5 on 144 degrees of freedom
## Multiple R-squared:  0.5447, Adjusted R-squared:  0.5416 
## F-statistic: 172.3 on 1 and 144 DF,  p-value: < 2.2e-16

Simple \(t\) test

t.test(mass_males, mass_females, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  mass_males and mass_females
## t = 13.126, df = 144, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  573.0666 776.2484
## sample estimates:
## mean of x mean of y 
##  4043.493  3368.836

Test statistics

Simple \(t\) test: \[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \quad \text{with} \quad s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]

Linear Regression: \[ \frac{\hat{\beta}_k - \beta_k}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{kk}}} \sim \mathcal{T}_{n-p} \] with \[ p = 2 \qquad n = n_f + n_m \qquad \beta_2 = 0 \text{ under } \mathcal{H}_0 \]

i.e. \[ \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \]

Test statistics

Let’s show that: \[ \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \quad \text{with} \quad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \]

is equal to \[ \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \quad \text{with} \quad s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]

Test statistics - Proof - 1/5

\[ \mathbf{X}^T\mathbf{X} = \begin{pmatrix} 1 & \cdots & 1 & 1 & \cdots & 1 \\ 0 & \cdots & 0 & 1 & \cdots & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} = \begin{pmatrix} n & n_m \\ n_m & n_m \end{pmatrix} \]

Hence: \[ (\mathbf{X}^T\mathbf{X})^{-1} = \begin{pmatrix} n & n_m \\ n_m & n_m \end{pmatrix}^{-1} = \frac{1}{nn_m - n_m^2} \begin{pmatrix} n_m & -n_m \\ -n_m & n \end{pmatrix} \]

And: \[ [(\mathbf{X}^T\mathbf{X})^{-1}]_{22} = \frac{n}{nn_m - n_m^2} = \frac{n_f + n_m}{(n - n_m)n_m} = \frac{n_f + n_m}{n_f n_m} = \frac{1}{n_f} + \frac{1}{n_m} \]

Test statistics - Proof - 2/5

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \]

But: \[ \mathbf{X}^{T}\mathbf{y} = \begin{pmatrix} 1 & \cdots & 1 & 1 & \cdots & 1 \\ 0 & \cdots & 0 & 1 & \cdots & 1 \\ \end{pmatrix} \mathbf{y} = \begin{pmatrix} \sum_{i = 1}^n y_i\\ \sum_{j = 1}^{n_m} y^m_j \end{pmatrix} \]

Hence: \[ \begin{aligned} \hat{\boldsymbol{\beta}} & = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} = \frac{1}{n_fn_m} \begin{pmatrix} n_m & -n_m \\ -n_m & n \end{pmatrix} \begin{pmatrix} \sum_{i = 1}^n y_i\\ \sum_{j = 1}^{n_m} y^m_j \end{pmatrix} \\ &= %\frac{1}{n_fn_m} %\begin{pmatrix} %n_m\sum_{i = 1}^n y_i - n_m\sum_{j = 1}^{n_m} y^m_j\\ %- n_m\sum_{i = 1}^{n} y_i + n\sum_{j = 1}^{n_m} y^m_j\\ %\end{pmatrix} %= \frac{1}{n_fn_m} \begin{pmatrix} n_m\sum_{i = 1}^n y_i - n_m\sum_{j = 1}^{n_m} y^m_j\\ - n_m\sum_{i = 1}^{n} y_i + n_m\sum_{j = 1}^{n_m} y^m_j + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} \\ &= \frac{1}{n_fn_m} \begin{pmatrix} n_m \sum_{i = 1}^{n_f} y^f_i\\ - n_m\sum_{i = 1}^{n_f} y^f_i + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} \end{aligned} \]

Test statistics - Proof - 3/5

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \]

And: \[ \hat{\boldsymbol{\beta}} = \frac{1}{n_fn_m} \begin{pmatrix} n_m \sum_{i = 1}^{n_f} y^f_i\\ - n_m\sum_{i = 1}^{n_f} y^f_i + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} = \begin{pmatrix} \bar{\mathbf{y}}^f\\ -\bar{\mathbf{y}}^f + \bar{\mathbf{y}}^m\\ \end{pmatrix} \]

Note: \[ \text{since } \quad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \quad \text{this result is coherent.} \]

Note: \[ \text{As} \quad \mathbf{x}^{i} = \begin{cases} (1~0) & \text{ if $i$ female} \\ (1~1) & \text{ if $i$ male} \\ \end{cases} \quad \text{we get:} \quad \hat{y}_i = \mathbf{x}^{i}\hat{\boldsymbol{\beta}} = \begin{cases} \bar{\mathbf{y}}^f & \text{ if $i$ female} \\ \bar{\mathbf{y}}^m & \text{ if $i$ male} \\ \end{cases} \]

Test statistics - Proof - 4/5

\[ \hat{\sigma}^2 = \frac{1}{n - 2}\sum_{i = 1}^n (y_i - \hat{y}_i)^2 \]

Hence: \[ \hat{\sigma}^2 = \frac{1}{n_f + n_m - 2} \left[ \sum_{i = 1}^{n_f} (y^f_i - \bar{\mathbf{y}}^f)^2 + \sum_{i = 1}^{n_m} (y^m_i - \bar{\mathbf{y}}^m)^2 \right] \]

Recall: \[ s^2_p = \frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2} \]

hence: \[ \hat{\sigma}^2 = s^2_p. \]

Test statistics - Proof - 5/5

\[ T = \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \]

but: \[ \hat{\beta}_2 = \bar{\mathbf{y}}^m -\bar{\mathbf{y}}^f \\ \hat{\sigma}^2 = s^2_p \\ [(\mathbf{X}^T\mathbf{X})^{-1}]_{22} = \frac{1}{n_f} + \frac{1}{n_m} \]

hence: \[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \]

And the simple \(t\)-test is the same as the \(t\)-test on the coefficient of the linear regression.

Summary

Group comparison:

  • \(n_f\) female adelie penguins, \(n_m\) male adelie penguins
  • \(y^f_{i}\): mass (g) of a female adelie penguin, \(1 \leq i \leq n_f\), \(y^f_{i} \sim \mathcal{N}(\mu_f, \sigma^2)\) iid
  • \(y^m_{j}\): mass (g) of a male adelie penguin, \(1 \leq j \leq n_m\), \(y^m_{j} \sim \mathcal{N}(\mu_m, \sigma^2)\) iid
  • \(\mathcal{H}_0\): \(\mu_f = \mu_m\) vs \(\mathcal{H}_1\): \(\mu_f \neq \mu_m\)

is the same as a Linear model with:

  • \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}\) with \(\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n)\)
  • \(\mathbf{X}_1 = \mathbb{1}_n\) and \(\mathbf{X}_2 = \mathbb{1}(male)\)
  • \(\beta_1 = \mu_f\) and \(\beta_2 = \mu_m - \mu_f\).
  • \(\mathcal{H}_0: \beta_2 = 0\) vs \(\mathcal{H}_1: \beta_2 \neq 0\)

\(\to\) what happens when there are more than two groups ?

One-way ANOVA

Species

We keep only the females penguins of any species, and remove missing data.

females <- subset(penguins, sex == "female")
females <- na.omit(females)
kable(females[c(1, 2, 101, 102, 151, 152), ])
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Gentoo Biscoe 43.8 13.9 208 4300 female 2008
Gentoo Biscoe 43.2 14.5 208 4450 female 2008
Chinstrap Dream 46.9 16.6 192 2700 female 2008
Chinstrap Dream 46.2 17.5 187 3650 female 2008

Species

Question: are the females of each species of different body mass ?

par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1)
plot(females$body_mass_g, col = females$species)     ## scatter plot
boxplot(body_mass_g ~ species, data = females)       ## box plot

\(t\) tests

Pairwise \(t\) tests:

pairwise.t.test(females$body_mass_g, females$species,
                p.adjust.method = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  females$body_mass_g and females$species 
## 
##           Adelie Chinstrap
## Chinstrap 0.0066 -        
## Gentoo    <2e-16 <2e-16   
## 
## P value adjustment method: none

\(\to\) \({K \choose 2}\) tests : Multiple testing problems.

\(t\) tests - Model

Model:

  • \(n_s\) penguins, for species \(s \in S = \{adelie, chins, gentoo\}\)
  • \(y^s_{i}\): mass (g) of a female penguin of species \(s\), \(1 \leq i \leq n_s\)
  • \(y^s_{i} \sim \mathcal{N}(\mu_s, \sigma^2)\) iid for any species \(s\).

Pairwise Tests:

\(\mathcal{H}^{s_1,s_2}_0\): \(\mu_{s_1} = \mu_{s_2}\) vs \(\mathcal{H}^{s_1,s_2}_1\): \(\mu_{s_1} \neq \mu_{s_2}\)
for any pair \((s_1,s_2)\) of species.

Global Tests can we do:

\(\mathcal{H}_0\): \(\mu_a = \mu_c = \mu_g\) vs \(\mathcal{H}_1\): \(\exists s_1, s_2 \in S ~|~ \mu_{s_1} \neq \mu_{s_2}\) ?

One-way ANOVA

  • \(S\) a set of \(|S| = K\) groups;
  • \(\mathbf{y}\) the vector of all data, length \(n = \sum_{k=1}^K n_k\);
  • \(\mathbf{X}\) the matrix of size \(n \times K\), such that:
    • \(\mathbf{X}_1 = \mathbb{1}_{n}\) is the intercept (vector column)
    • \(\mathbf{X}_k = \mathbb{1}(S_k)\) the indicator of group \(k\), \(2 \leq k \leq K\).
      • \(x_{ik} = 1\) if individual \(i\) is in group \(k\), and \(0\) otherwise.
  • \(\boldsymbol{\beta}\) the vector of mean differences, of length \(K\):
    • \(\beta_1 = \mu_1\)
    • \(\beta_k = \mu_k - \mu_1\)
  • the first group is the reference

One-way ANOVA

\[ \begin{pmatrix} y^1_1 \\ \vdots\\ y^1_{n_1} \\ y^2_{1} \\ \vdots\\ y^2_{n_2} \\ \vdots\\ y^K_{1} \\ \vdots\\ y^K_{n_K} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 0\\ 1 & 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 1\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 1\\ \end{pmatrix} \begin{pmatrix} \mu_1 \\ \mu_2 - \mu_1 \\ \vdots\\ \mu_K - \mu_1 \\ \end{pmatrix} + \begin{pmatrix} \epsilon^1_1 \\ \vdots\\ \epsilon^1_{n_1} \\ \epsilon^2_{1} \\ \vdots\\ \epsilon^2_{n_2} \\ \vdots\\ \epsilon^K_{1} \\ \vdots\\ \epsilon^K_{n_K} \\ \end{pmatrix} \]

One-way ANOVA

Model:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]

Note: Assuming that the groups are disjoint, \(\text{rg}(X) = K\).

This is equivalent to the following model: \[ \begin{aligned} y_i &= \beta_1 + \epsilon_i = \mu_1 + \epsilon_i & \text{ if } &i \text{ is in group } 1\\ y_i &= \beta_1 + \beta_k + \epsilon_i = \mu_k + \epsilon_i & \text{ if } &i \text{ is in group } k,\ k>1 \\ \end{aligned} \]

i.e. each group has a mean \(\mu_k\), and the errors are Gaussian iid.

Species

fit <- lm(body_mass_g ~ species, data = females)
model.matrix(fit)[c(1, 2, 101, 102, 151, 152), ]
##     (Intercept) speciesChinstrap speciesGentoo
## 1             1                0             0
## 2             1                0             0
## 101           1                0             1
## 102           1                0             1
## 151           1                1             0
## 152           1                1             0

Species

summary(fit)
## 
## Call:
## lm(formula = body_mass_g ~ species, data = females)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -827.21 -193.84   20.26  181.16  622.79 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3368.84      32.42 103.908  < 2e-16 ***
## speciesChinstrap   158.37      57.52   2.754  0.00657 ** 
## speciesGentoo     1310.91      48.72  26.904  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 277 on 162 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.8271 
## F-statistic: 393.2 on 2 and 162 DF,  p-value: < 2.2e-16

One-way ANOVA - Coefficients

It is easy to show that:

\[ \hat{\beta}_1 = \bar{\mathbf{y}}^{s_1}\\ \hat{\beta}_k = \bar{\mathbf{y}}^{s_k} - \bar{\mathbf{y}}^{s_1} \text{ for } 2 \leq k \leq K\\ \] Hence:

  • \(\hat{\beta}_1\) is the estimator of the mean of the first group;
  • \(\hat{\beta}_k\) is the estimator of the difference between the mean of group \(k\) and the first group (\(k > 1\)).

One-way ANOVA - Coefficients

We have: \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_{s_1}\\ \mu_{s_2} - \mu_{s_1}\\ \vdots\\ \mu_{s_K} - \mu_{s_1}\\ \end{pmatrix} \quad \text{and} \quad \hat{\boldsymbol{\beta}} = \begin{pmatrix} \bar{\mathbf{y}}^{s_1}\\ \bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\ \vdots\\ \bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\ \end{pmatrix} \]

  • The \(t\) test on the coefficient \(\beta_k\):

\[ \mathcal{H}^{k}_0: \beta_{k} = 0 \quad{ vs } \quad \mathcal{H}^{k}_1: \beta_{k} \neq 0 \]

is the same as the test of the difference between groups \(1\) & \(k\):

\[ \mathcal{H}^{1,k}_0: \mu_{1} = \mu_{k} \quad{ vs } \quad \mathcal{H}^{1,k}_1: \mu_{1} \neq \mu_{k} \]

One-way ANOVA - Global Test

We have: \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_{s_1}\\ \mu_{s_2} - \mu_{s_1}\\ \vdots\\ \mu_{s_K} - \mu_{s_1}\\ \end{pmatrix} \quad \text{and} \quad \hat{\boldsymbol{\beta}} = \begin{pmatrix} \bar{\mathbf{y}}^{s_1}\\ \bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\ \vdots\\ \bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\ \end{pmatrix} \]

  • The global \(F\) test on the coefficients \(\beta_2, \dotsc, \beta_K\):

\[ \mathcal{H}_0: \beta_{2} = \beta_3 = \cdots = \beta_{K} = 0 \quad{ vs } \quad \mathcal{H}_1: \exists k | \beta_{k} \neq 0 \]

is the same as the global test of mean equality of all groups:

\[ \mathcal{H}_0: \mu_{1} = \mu_2 = \cdots = \mu_{K} \quad{ vs } \quad \mathcal{H}_1: \exists k, k' | \mu_{k} \neq \mu_{k'} \]

Species

fit <- lm(body_mass_g ~ species, data = females)
summary(fit)
## 
## Call:
## lm(formula = body_mass_g ~ species, data = females)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -827.21 -193.84   20.26  181.16  622.79 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3368.84      32.42 103.908  < 2e-16 ***
## speciesChinstrap   158.37      57.52   2.754  0.00657 ** 
## speciesGentoo     1310.91      48.72  26.904  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 277 on 162 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.8271 
## F-statistic: 393.2 on 2 and 162 DF,  p-value: < 2.2e-16

One-way ANOVA - Parametrization

General Model

In general, we might write for observation \(i\) of group \(k\): \[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

  • \(\mu\) would be the “grand mean” (intercept)
  • \(\beta_k\) the specific effect of group \(k\) (\(1 \leq k \leq K\)).

General Model Over-Parametrization

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

The model has \(K+1\) parameters.

The associated regression matrix would be: \[ \mathbf{X} = (\mathbb{1}, \mathbb{1}(S_1), \dotsc, \mathbb{1}(S_K)) \]

\(\mathbf{X}\) is not of full rank: \(\mathbb{1} = \sum_{k=1}^K \mathbb{1}(S_k)\).

Exemple with two groups

\[ \mathbf{X} = \begin{pmatrix} 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \\ \end{pmatrix} \]

We need some constraint on the parameters.

Constraint : Group 1 as reference

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

In the previous studies, we assumed \[\beta_1 = 0\]

  • group 1 has the “reference” mean \(\mu\)
  • group \(k>1\) has mean \(\mu + \beta_k\)
  • \(\beta_k = \mu_k - \mu_1\) is the difference between group 1 and \(k\).
  • \(\mathbf{X} = (\mathbb{1}, \mathbb{1}(S_2), \dotsc, \mathbb{1}(S_K))\) is full rank.

\(\to\) This is the constraint used by default in R.

Constraint : No intercept

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

We can assume \[\mu = 0\]

  • \(\beta_k = \mu_k\) models the mean of group \(k\) directly.
  • \(\mathbf{X} = (\mathbb{1}(S_1), \dotsc, \mathbb{1}(S_K))\) is full rank.
  • estimators of \(\beta_k\) are changed, but not the estimator of \(\sigma^2\), nor the fitted values.

Constraint : Null Sum

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

We can assume \[ \sum_{k = 1}^K \beta_k= 0 \qquad \text{i.e.} \qquad \beta_K = - \sum_{k = 1}^{K-1} \beta_k \]

  • \(\mathbf{X} = (\mathbb{1}, \mathbb{1}(S_1) - \mathbb{1}(S_K), \dotsc, \mathbb{1}(S_{K-1}) - \mathbb{1}(S_K))\) is full rank.
  • estimators of \(\mu\), \(\beta_k\) are changed, but not the estimator of \(\sigma^2\), nor the fitted values.

Constraint : Weighted Null Sum

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

We can assume \[ \sum_{k = 1}^K n_k \beta_k= 0 \qquad \text{i.e.} \qquad \beta_K = -\frac{1}{n_K} \sum_{k = 1}^{K-1} n_k \beta_k \]

  • \(\mathbf{X} = (\mathbb{1}, \mathbb{1}(S_1) - \frac{n_1}{n_K}\mathbb{1}(S_K), \dotsc, \mathbb{1}(S_{K-1}) - \frac{n_{K-1}}{n_K}\mathbb{1}(S_K))\) is full rank.
  • estimators of \(\mu\), \(\beta_k\) are changed, but not the estimator of \(\sigma^2\), nor the fitted values.

One-way ANOVA table

One-way ANOVA - \(F\) test

\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]

ANOVA Test: Has the grouping any effect at all ? I.e. \[ \begin{aligned} \mathcal{H}_0&: \beta_{1} = \cdots = \beta_K = 0 &\text{vs}\\ \mathcal{H}_1&: \exists~k\in \{1, \dotsc, K\} ~|~ \beta_k \neq 0 \end{aligned} \]

No matter which set of constraint we choose, the \(F\) statistic associated with the one factor ANOVA is:

\[ F = \frac{ \|\hat{\mathbf{y}} - \bar{y}\mathbb{1}\|^2 / (K - 1) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - K) } \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - K}. \]

\(\to\) Global \(F\) test on the coefficients of the linear regression.

ANOVA = “ANalyse Of VAriance”

One-way ANOVA - Global \(F\) Test

With R parametrization: \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_{s_1}\\ \mu_{s_2} - \mu_{s_1}\\ \vdots\\ \mu_{s_K} - \mu_{s_1}\\ \end{pmatrix} \quad \text{and} \quad \hat{\boldsymbol{\beta}} = \begin{pmatrix} \bar{\mathbf{y}}^{s_1}\\ \bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\ \vdots\\ \bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\ \end{pmatrix} \]

  • The global \(F\) test on the coefficients \(\beta_2, \dotsc, \beta_K\) is:

\[ \mathcal{H}_0: \beta_{2} = \beta_3 = \cdots = \beta_{K} = 0 \quad{ vs } \quad \mathcal{H}_1: \exists k | \beta_{k} \neq 0 \]

and is the same as the global test of mean equality of all groups:

\[ \mathcal{H}_0: \mu_{1} = \mu_2 = \cdots = \mu_{K} \quad{ vs } \quad \mathcal{H}_1: \exists k, k' | \mu_{k} \neq \mu_{k'} \]

Species

fit <- lm(body_mass_g ~ species, data = females)
summary(fit)
## 
## Call:
## lm(formula = body_mass_g ~ species, data = females)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -827.21 -193.84   20.26  181.16  622.79 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3368.84      32.42 103.908  < 2e-16 ***
## speciesChinstrap   158.37      57.52   2.754  0.00657 ** 
## speciesGentoo     1310.91      48.72  26.904  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 277 on 162 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.8271 
## F-statistic: 393.2 on 2 and 162 DF,  p-value: < 2.2e-16

One-way ANOVA - Table

\[ F = \frac{ \|\hat{\mathbf{y}} - \bar{y}\mathbb{1}\|^2 / (K - 1) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - K) } = \frac{ ESS / (K - 1) }{ RSS / (n - K) } \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - K}. \]

“Anova table”:

Df SS Mean SS F \(p\) value
factor \(K - 1\) \(ESS\) \(ESS / (K - 1)\) \(\frac{ESS / (K - 1)}{RSS / (n - K)}\) \(p\)
residuals \(n - K\) \(RSS\) \(RSS / (n - K)\)
total \(n - 1\) \(TSS\)

Species

fit <- lm(body_mass_g ~ species, data = females)
anova(fit)
## Analysis of Variance Table
## 
## Response: body_mass_g
##            Df   Sum Sq  Mean Sq F value    Pr(>F)    
## species     2 60350016 30175008  393.25 < 2.2e-16 ***
## Residuals 162 12430757    76733                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-way ANOVA

Species

We keep all the penguins of any species and any sex, and remove missing data.

penguins_nona <- na.omit(penguins)
kable(penguins_nona[c(1, 2, 151, 152, 271, 272), ])
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Gentoo Biscoe 47.6 14.5 215 5400 male 2007
Gentoo Biscoe 46.5 13.5 210 4550 female 2007
Chinstrap Dream 45.2 17.8 198 3950 female 2007
Chinstrap Dream 46.1 18.2 178 3250 female 2007

Species

Question: what is the effect of both species and sex on the body mass ?

par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1)
plot(penguins_nona$body_mass_g,                             ## scatter plot
     pch = as.numeric(penguins_nona$species),               ## point type = species
     col = penguins_nona$sex)                               ## color = sex
boxplot(body_mass_g ~ sex + species, data = penguins_nona,  ## box plot
        col = c("grey", "red")) 

Model

Observation \(i\) of group \(k\) in factor 1 and group \(l\) in factor 2 is written as: \[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]

  • \(\mu\) the “grand mean” (intercept)
  • \(\alpha_k\) the specific effect of group \(k\) in factor 1 (\(1 \leq k \leq K\)).
  • \(\beta_l\) the specific effect of group \(l\) in factor 2 (\(1 \leq l \leq L\)).
  • \(\gamma_{kl}\) the interaction effect of groups \(k\) and \(l\).

The full model has regression matrix: \[ {\small \mathbf{X} = (\mathbb{1}, \mathbb{1}_{S_1}, \dotsc, \mathbb{1}_{S_K}, \mathbb{1}_{T_1}, \dotsc, \mathbb{1}_{T_L}, \mathbb{1}_{S_1, T_1}, \dotsc, \mathbb{1}_{S_K, T_L} ) } \]

\(\to\) \(1 + K + L + KL\) predictors. \(\to\) not identifiable !

Constraint : Groups 1 as reference

\[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]

Assume, for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \alpha_1 = 0, \qquad \beta_1 = 0, \qquad \gamma_{1,l} = 0, \qquad \gamma_{k,1} = 0. \]

  • \(y_{i,1,1}\) has mean \(\mu\) (reference)
  • \(y_{i,1,l}\) has mean \(\mu + \beta_l\)
  • \(y_{i,k,1}\) has mean \(\mu + \alpha_k\)
  • \(y_{i,k,l}\) has mean \(\mu + \alpha_k + \beta_l + \gamma_{k,l}\) (\(k,l>1\))
  • \(1 + K + L\) constraints \(\to\) \(KL\) free parameters
  • \(\mathbf{X} = (\mathbb{1}, \mathbb{1}_{S_2}, \dotsc, \mathbb{1}_{S_K}, \mathbb{1}_{T_2}, \dotsc, \mathbb{1}_{T_L}, \mathbb{1}_{S_2, T_2}, \dotsc, \mathbb{1}_{S_K, T_L})\) full

\(\to\) This is the constraint used by default in R.

Species and Sex

fit <- lm(body_mass_g ~  species * sex, data = penguins_nona) 
summary(fit)
## 
## Call:
## lm(formula = body_mass_g ~ species * sex, data = penguins_nona)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -827.21 -213.97   11.03  206.51  861.03 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               3368.84      36.21  93.030  < 2e-16 ***
## speciesChinstrap           158.37      64.24   2.465  0.01420 *  
## speciesGentoo             1310.91      54.42  24.088  < 2e-16 ***
## sexmale                    674.66      51.21  13.174  < 2e-16 ***
## speciesChinstrap:sexmale  -262.89      90.85  -2.894  0.00406 ** 
## speciesGentoo:sexmale      130.44      76.44   1.706  0.08886 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 309.4 on 327 degrees of freedom
## Multiple R-squared:  0.8546, Adjusted R-squared:  0.8524 
## F-statistic: 384.3 on 5 and 327 DF,  p-value: < 2.2e-16

Other constraint

\[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]

Interactions only: for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \mu = 0,\qquad \alpha_k = 0, \qquad \beta_l = 0. \]

Null sum: \[ \sum_{k=1}^K\alpha_k = 0, \qquad \sum_{l=1}^L\beta_l = 0, \qquad \sum_{k=1}^K\gamma_{kl} = 0, \qquad \sum_{l=1}^L\gamma_{kl} = 0. \]

\(\to\) changes the estimation of the coefficients, but not of the variance, nor the predicted values.

Nested F Test - Reminder

Recall that (from CM5), in general, to test: \[ \begin{aligned} \mathcal{H}_0&: \beta_{p_0 + 1} = \cdots = \beta_p = 0 &\text{vs}\\ \mathcal{H}_1&: \exists~k\in \{p_0+1, \dotsc, p\} ~|~ \beta_k \neq 0 \end{aligned} \]

We can use the statistic: \[ F = \frac{ \|\hat{\mathbf{y}} - \hat{\mathbf{y}}_0\|^2 / (p - p_0) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - p) } = \frac{n - p}{p - p_0}\frac{RSS_0 - RSS}{RSS} \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{p - p_0}_{n - p}. \]

Writing \(\mathbf{X}_{p_0} = (\mathbf{X}_1, \dotsc, \mathbf{X}_{p_0})\), this is the same as testing: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{p_0}}\mathbf{y} = \mathbf{P}^{\mathbf{X}}\mathbf{y} &\text{vs}\\ \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{p_0}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}}\mathbf{y} \end{aligned} \]

Nested F Test - Reminder

Nested F Test - General

Take any two nested subsets of indexes \(\eta_1 \subset \eta_2\), and write: \[ \begin{aligned} \mathbf{X}_{\eta_k} &= (\mathbf{X}_i)_{i\in\eta_k} & \mathbf{P}^{\mathbf{X}_{\eta_k}}\mathbf{y} &= \hat{\mathbf{y}}_{\eta_k} & \mathbf{P}^{\mathbf{X}}\mathbf{y} &= \hat{\mathbf{y}} \end{aligned} \]

To test: \[ \begin{aligned} \mathcal{H}_0&: \text{"Larger model 2 does not bring any information compared to model 1"}\\ \mathcal{H}_1&: \text{"Larger model 2 does bring some information compared to model 1"} \end{aligned} \]

i.e. \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\eta_1}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\eta_2}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\eta_1}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\eta_2}}\mathbf{y} \end{aligned} \]

We can use: \[ \begin{aligned} F &= \frac{ \|\hat{\mathbf{y}}_{\eta_2} - \hat{\mathbf{y}}_{\eta_1}\|^2 / (|\eta_2| - |\eta_1|) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - p) }\\ &= \frac{ 1 }{ \hat{\sigma}^2 } (RSS_{\eta_1} - RSS_{\eta_2}) / (|\eta_2| - |\eta_1|) \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{|\eta_2| - |\eta_1|}_{n - p}. \end{aligned} \]

Anova model

Model: \[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \] with, for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \alpha_1 = 0, \qquad \beta_1 = 0, \qquad \gamma_{1,l} = 0, \qquad \gamma_{k,1} = 0. \] (\(\to KL\) free parameters in total.)

Tests:

  • Is factor 1 useful ?
  • Is factor 2 useful ?
  • Is the interaction between factor 1 and 2 useful ?

Nested F Test - Anova - Factor 1

To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “factor 1 does not bring any information compared to intercept alone”.

We can use: \[ \begin{aligned} F &= \frac{n - KL}{K - 1}\frac{RSS_{\mu} - RSS_{\mu, \alpha}}{RSS}\\ &= \frac{R(\alpha | \mu) / (K - 1)}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - KL}. \end{aligned} \]

Nested F Test - Anova - Factor 2

To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha, \beta}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “factor 2 does not bring any information compared to factor 1”.

We can use: \[ \begin{aligned} F &= \frac{n - KL}{L - 1}\frac{RSS_{\mu, \alpha} - RSS_{\mu, \alpha, \beta}}{RSS}\\ &= \frac{R(\beta | \mu, \alpha) / (L - 1)}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{L - 1}_{n - KL}. \end{aligned} \]

Nested F Test - Anova - Interaction

To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha, \beta,\gamma}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\gamma}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta,\gamma}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “Interactions do not bring any information compared to factors 1 and 2”.

We can use: \[ \begin{aligned} F &= \frac{n - KL}{(L - 1)(K - 1)}\frac{RSS_{\mu, \alpha, \beta} - RSS_{\mu, \alpha, \beta, \gamma}}{RSS}\\ &= \frac{R(\gamma | \mu, \alpha, \beta) / ((K - 1)(L - 1))}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{(K - 1)(L - 1)}_{n - KL}. \end{aligned} \]

Two-way ANOVA - Type I Table

“Type I” Anova table:

Df SS Mean SS F \(p\) value
factor 1 \(K - 1\) \(R(\alpha|\mu)\) \(R(\alpha|\mu) / (K - 1)\) \(\hat{\sigma}^{-2} R(\alpha|\mu) / (K - 1)\) \(p\)
factor 2 \(L - 1\) \(R(\beta|\mu, \alpha)\) \(R(\beta|\mu, \alpha) / (L - 1)\) \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) \(p\)
interactions \((K - 1)(L - 1)\) \(R(\gamma|\mu, \alpha, \beta)\) \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(p\)
residuals \(n - KL\) \(RSS\) \(\hat{\sigma}^2 = RSS / (n - KL)\)

Species and Sex - Type I

fit <- lm(body_mass_g ~  species * sex, data = penguins_nona) 
anova(fit)
## Analysis of Variance Table
## 
## Response: body_mass_g
##              Df    Sum Sq  Mean Sq F value    Pr(>F)    
## species       2 145190219 72595110 758.358 < 2.2e-16 ***
## sex           1  37090262 37090262 387.460 < 2.2e-16 ***
## species:sex   2   1676557   838278   8.757 0.0001973 ***
## Residuals   327  31302628    95727                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sex and Species - Type I

fit <- lm(body_mass_g ~  sex * species, data = penguins_nona) 
anova(fit)
## Analysis of Variance Table
## 
## Response: body_mass_g
##              Df    Sum Sq  Mean Sq F value    Pr(>F)    
## sex           1  38878897 38878897 406.145 < 2.2e-16 ***
## species       2 143401584 71700792 749.016 < 2.2e-16 ***
## sex:species   2   1676557   838278   8.757 0.0001973 ***
## Residuals   327  31302628    95727                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-way ANOVA - Type I Table

“Type I” Anova table:

Df SS Mean SS F \(p\) value
factor 1 \(K - 1\) \(R(\alpha|\mu)\) \(R(\alpha|\mu) / (K - 1)\) \(\hat{\sigma}^{-2} R(\alpha|\mu) / (K - 1)\) \(p\)
factor 2 \(L - 1\) \(R(\beta|\mu, \alpha)\) \(R(\beta|\mu, \alpha) / (L - 1)\) \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) \(p\)
interactions \((K - 1)(L - 1)\) \(R(\gamma|\mu, \alpha, \beta)\) \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(p\)
residuals \(n - KL\) \(RSS\) \(\hat{\sigma}^2 = RSS / (n - KL)\)

Attention: The order of factors matters !

Two-way ANOVA - Type II Table

“Type II” Anova table:

Df SS Mean SS F \(p\) value
factor 1 \(K - 1\) \(R(\alpha|\mu, \beta)\) \(R(\alpha|\mu, \beta) / (K - 1)\) \(\hat{\sigma}^{-2} R(\alpha|\mu, \beta) / (K - 1)\) \(p\)
factor 2 \(L - 1\) \(R(\beta|\mu, \alpha)\) \(R(\beta|\mu, \alpha) / (L - 1)\) \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) \(p\)
interactions \((K - 1)(L - 1)\) \(R(\gamma|\mu, \alpha, \beta)\) \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) \(p\)
residuals \(n - KL\) \(RSS\) \(\hat{\sigma}^2 = RSS / (n - KL)\)

Symmetric in factor 1 and 2.

Sex and Species - Type II

fit <- lm(body_mass_g ~  sex * species, data = penguins_nona) 
car::Anova(fit)
## Anova Table (Type II tests)
## 
## Response: body_mass_g
##                Sum Sq  Df F value    Pr(>F)    
## sex          37090262   1 387.460 < 2.2e-16 ***
## species     143401584   2 749.016 < 2.2e-16 ***
## sex:species   1676557   2   8.757 0.0001973 ***
## Residuals    31302628 327                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bill Lenght - Type II

fit <- lm(bill_length_mm ~  sex * species, data = penguins_nona) 
car::Anova(fit)
## Anova Table (Type II tests)
## 
## Response: bill_length_mm
##             Sum Sq  Df  F value Pr(>F)    
## sex         1135.7   1 211.8066 <2e-16 ***
## species     6975.6   2 650.4786 <2e-16 ***
## sex:species   24.5   2   2.2841 0.1035    
## Residuals   1753.3 327                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bill Lenght - Type II - No interaction

fit <- lm(bill_length_mm ~  sex + species, data = penguins_nona) 
car::Anova(fit)
## Anova Table (Type II tests)
## 
## Response: bill_length_mm
##           Sum Sq  Df F value    Pr(>F)    
## sex       1135.7   1  210.17 < 2.2e-16 ***
## species   6975.6   2  645.44 < 2.2e-16 ***
## Residuals 1777.8 329                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOVA

Penguins

Question: what is the relationship between bill length and depth, controlling for the species ?

One line fits all

fit_all <- lm(bill_length_mm ~ bill_depth_mm, data = females)
summary(fit_all)
## 
## Call:
## lm(formula = bill_length_mm ~ bill_depth_mm, data = females)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.0745  -3.0689  -0.7609   2.6776  17.5034 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    61.2214     3.1966  19.152  < 2e-16 ***
## bill_depth_mm  -1.1643     0.1935  -6.018 1.13e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.449 on 163 degrees of freedom
## Multiple R-squared:  0.1818, Adjusted R-squared:  0.1768 
## F-statistic: 36.22 on 1 and 163 DF,  p-value: 1.128e-08

One line fits all

\[ y_{i} = \mu + \beta x_{i} + \epsilon_i \]

Take species into account

fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females)
summary(fit_species)
## 
## Call:
## lm(formula = bill_length_mm ~ bill_depth_mm + species, data = females)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9886 -1.4503 -0.0603  1.4476 11.2797 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       25.0447     3.9377   6.360 1.99e-09 ***
## bill_depth_mm      0.6930     0.2230   3.108  0.00222 ** 
## speciesChinstrap   9.3393     0.4648  20.092  < 2e-16 ***
## speciesGentoo     10.6515     0.8510  12.516  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.238 on 161 degrees of freedom
## Multiple R-squared:  0.7954, Adjusted R-squared:  0.7916 
## F-statistic: 208.7 on 3 and 161 DF,  p-value: < 2.2e-16

Take species into account

\[ y_{i,k} = \mu + \alpha_k + \beta x_{i} + \epsilon_i \]

Take species into account

fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females)
car::Anova(fit_species)
## Anova Table (Type II tests)
## 
## Response: bill_length_mm
##                Sum Sq  Df  F value    Pr(>F)    
## bill_depth_mm   48.41   1   9.6624  0.002224 ** 
## species       2419.64   2 241.4533 < 2.2e-16 ***
## Residuals      806.70 161                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interactions

fit_int <- lm(bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species,
              data = females)
fit_int
## 
## Call:
## lm(formula = bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species, 
##     data = females)
## 
## Coefficients:
##                    (Intercept)                   bill_depth_mm  
##                        31.1671                          0.3456  
##               speciesChinstrap                   speciesGentoo  
##                        -2.5348                         -8.8729  
## bill_depth_mm:speciesChinstrap     bill_depth_mm:speciesGentoo  
##                         0.6745                          1.2887

Interactions

\[ y_{i,k} = \mu + \alpha_k + \beta_k x_{i} + \epsilon_i \]

Interactions

fit_int <- lm(bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species,
              data = females)
car::Anova(fit_int)
## Anova Table (Type II tests)
## 
## Response: bill_length_mm
##                        Sum Sq  Df  F value    Pr(>F)    
## bill_depth_mm           48.41   1   9.8428  0.002032 ** 
## species               2419.64   2 245.9610 < 2.2e-16 ***
## bill_depth_mm:species   24.62   2   2.5029  0.085070 .  
## Residuals              782.08 159                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ancova - One Regressor - Model

For any observation \(1 \leq i \leq n\) in group \(1 \leq k \leq K\), we write: \[ y_{i,k} = \mu + \alpha_k + (\beta + \gamma_k) \times x_{i} + \epsilon_i \qquad \epsilon_i\sim\mathcal{N}(0, \sigma^2)~iid \]

Identifiability: \(\alpha_1 = \gamma_1 = 0\).

\[ \text{With $K=2$:}\qquad \mathbf{X} = \begin{pmatrix} 1 & 0 & x_1 & 0\\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & x_{n_1} & 0 &\\ 1 & 1 & x_{n_1 + 1} & x_{n_1 + 1} \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 1 & x_{n_1 + n_2} & x_{n_1 + n_2}\\ \end{pmatrix} \]

Ancova

F tests of effects and interactions can be done as before

More than one predictor ? \(\to\) not a problem

More than one factor ? \(\to\) not a problem

Attention as model become more complex, they are more difficult to interpret and test.

Attention diagnostics on the residuals (CM6) still needed.

Bill length and depth

fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females)
par(mfrow = c(2,2))
plot(fit_species)

Perspectives

Hasty Summary

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

  • \(\mathbf{y}\) quantitative response
  • \(\mathbf{X}\) matrix of predictors (quantitative or qualitative)
  • \(\boldsymbol{\beta}\) deterministic coefficient
  • \(\boldsymbol{\epsilon}\) random, centered, null covariance (Gaussian)

Hasty Summary

  • Fit
    • OLS estimates of \(\hat{\boldsymbol{\beta}}\) and \(\hat{\sigma}^2\)
    • ML estimates and CI when Gaussian
  • Model Selection
    • \(t\) test the hypothesis “\(\boldsymbol{\beta}_k = 0\)” (when Gaussian)
    • \(F\) test of nested models (when Gaussian)
    • Penalized criteria (AIC, BIC, Mallow’s C_p, …)
  • Prediction
    • New points \(\hat{y}_{i}\), with CI (when Gaussian)
  • Diagnostics
    • Analysis of the residuals (outliers, Gaussian, …)
    • Analysis of the projection matrix (Leverage, Cook, …)

Perspectives and key words

  • Model Selection
    • hard to do when there are many predictors (see CM 5)
      \(\to\) regularized models (lasso, …), Bayesian Statistics
  • No independence assumption
    • Observations are correlated (e.g. geography)
      \(\to\) linear mixed models
  • \(\mathbf{y}\) is qualitative: classification
    • e.g. \(\mathbf{y} = 0/1\) is the response to a treatment (dose, …)
      \(\to\) logistic regression
  • No normal assumption
    • \(\mathbf{y}\) qualitative, counts, positive, …
      \(\to\) Generalized Linear Models (GLM)

Ressources

Conclusion

From ANOVA to Mixed Models

Sleep Study

library(lme4)
kable(head(sleepstudy))
Reaction Days Subject
249.5600 0 308
258.7047 1 308
250.8006 2 308
321.4398 3 308
356.8519 4 308
414.6901 5 308

Sleep Study

ANCOVA

For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + \alpha_k + (\beta + \gamma_k) \times x_{i} + \epsilon_{i, k} \]

  • \(\mu\) the intercept
  • \(\alpha_k\) the specific effect of group \(k\) on the intercept
  • \(\gamma_k\) the specific effect of group \(k\) on the slope
  • \(\epsilon_{i, k} \sim \mathcal{N}(0, \sigma^2)\) iid noise.

Problem: What if there are many groups, with few observations?
\(\to\) Estimation problem.

ANCOVA

fit_ancova <- lm(Reaction ~ Days * Subject, data = sleepstudy)
summary(fit_ancova)
## 
## Call:
## lm(formula = Reaction ~ Days * Subject, data = sleepstudy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -106.397  -10.692   -0.177   11.417  132.510 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      244.193     15.042  16.234  < 2e-16 ***
## Days              21.765      2.818   7.725 1.74e-12 ***
## Subject309       -39.138     21.272  -1.840 0.067848 .  
## Subject310       -40.708     21.272  -1.914 0.057643 .  
## Subject330        45.492     21.272   2.139 0.034156 *  
## Subject331        41.546     21.272   1.953 0.052749 .  
## Subject332        20.059     21.272   0.943 0.347277    
## Subject333        30.826     21.272   1.449 0.149471    
## Subject334        -4.030     21.272  -0.189 0.850016    
## Subject335        18.842     21.272   0.886 0.377224    
## Subject337        45.911     21.272   2.158 0.032563 *  
## Subject349       -29.081     21.272  -1.367 0.173728    
## Subject350       -18.358     21.272  -0.863 0.389568    
## Subject351        16.954     21.272   0.797 0.426751    
## Subject352        32.179     21.272   1.513 0.132535    
## Subject369        10.775     21.272   0.507 0.613243    
## Subject370       -33.744     21.272  -1.586 0.114870    
## Subject371         9.443     21.272   0.444 0.657759    
## Subject372        22.852     21.272   1.074 0.284497    
## Days:Subject309  -19.503      3.985  -4.895 2.61e-06 ***
## Days:Subject310  -15.650      3.985  -3.928 0.000133 ***
## Days:Subject330  -18.757      3.985  -4.707 5.84e-06 ***
## Days:Subject331  -16.499      3.985  -4.141 5.88e-05 ***
## Days:Subject332  -12.198      3.985  -3.061 0.002630 ** 
## Days:Subject333  -12.623      3.985  -3.168 0.001876 ** 
## Days:Subject334   -9.512      3.985  -2.387 0.018282 *  
## Days:Subject335  -24.646      3.985  -6.185 6.07e-09 ***
## Days:Subject337   -2.739      3.985  -0.687 0.492986    
## Days:Subject349   -8.271      3.985  -2.076 0.039704 *  
## Days:Subject350   -2.261      3.985  -0.567 0.571360    
## Days:Subject351  -15.331      3.985  -3.848 0.000179 ***
## Days:Subject352   -8.198      3.985  -2.057 0.041448 *  
## Days:Subject369  -10.417      3.985  -2.614 0.009895 ** 
## Days:Subject370   -3.709      3.985  -0.931 0.353560    
## Days:Subject371  -12.576      3.985  -3.156 0.001947 ** 
## Days:Subject372  -10.467      3.985  -2.627 0.009554 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.59 on 144 degrees of freedom
## Multiple R-squared:  0.8339, Adjusted R-squared:  0.7936 
## F-statistic: 20.66 on 35 and 144 DF,  p-value: < 2.2e-16

Mixed Model

For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]

  • \(\mu\) the intercept
  • \(A_k \sim \mathcal{N}(0, s_A^2)\) iid effect of group \(k\) on the intercept
  • \(G_k \sim \mathcal{N}(0, s_G^2)\) iid effect of group \(k\) on the slope
  • \(\epsilon_{ik} \sim \mathcal{N}(0, \sigma^2)\) iid noise.

Mixed Model - Simple Case

For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + \epsilon_i \]

  • \(A_k \sim \mathcal{N}(0, s_A^2)\) iid
  • \(\epsilon_{ik} \sim \mathcal{N}(0, \sigma^2)\) iid noise.

\[ \mathbb{E}[y_{i,k}] = \mu \]

\[ \mathbb{V}[y_{i,k}] = s_A^2 + \sigma^2 \]

\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 & \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]

Mixed Model - Simple Case

ANOVA \[ y_{i,k} = \mu + \alpha_{k} + \epsilon_{ik} \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]

\[ \mathcal{H}_0: \alpha_1 = \cdots = \alpha_K = 0 \]

Mixed Model \[ y_{i,k} = \mu + A_{k} + \epsilon_{ik} \quad A_k \sim \mathcal{N}(0, s_A^2) ~ iid \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]

\[ \mathcal{H}_0: s_A^2 = 0 \]

\[~\]

Mixed Model - Simple Case

\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]

\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{Z}\mathbf{U} + \boldsymbol{\epsilon} \]

\[ \mathbf{y} = \text{vec}((y_{i,k})_{i, k}) = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z} = \begin{pmatrix} 1 & 0 & \cdots & 0\\ 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U} &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]

Mixed Model - Simple Case

\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]

\[~\]

\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]

\[~\]

\[ \mathbf{\Sigma} = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \qquad \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 & s_A^2 & \cdots & s_A^2\\ s_A^2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2\\ s_A^2 & \cdots & s_A^2 & \sigma^2 + s_A^2\\ \end{pmatrix} \]

Mixed Model - General Case

Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]

\[~\]

Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]

\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.

Mixed Model - Sleep

For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]

  • \(A_k \sim \mathcal{N}(0, s_A^2)\) iid effect of group \(k\) on the intercept
  • \(G_k \sim \mathcal{N}(0, s_G^2)\) iid effect of group \(k\) on the slope
  • \(\epsilon_{ik} \sim \mathcal{N}(0, \sigma^2)\) iid noise.

\[ \mathbb{E}[y_{i,k}] = \mu + \beta x_i \]

\[ \mathbb{V}[y_{i,k}] = s_A^2 + s_G^2x_i^2 + \sigma^2 \]

\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 + s_G^2 x_i x_{i'}& \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]

Mixed Model - Sleep

\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]

\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]

\[ \mathbf{y} = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z}_2 = \begin{pmatrix} x_1 & 0 & \cdots & 0\\ x_2 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ x_n & 0 & \cdots & 0\\ 0 & x_1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & x_2 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & x_n\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U}_1 &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \mathbf{U}_2 &\sim \mathcal{N}(\mathbf{0}, s_G^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]

Mixed Model - Sleep

\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]

\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]

\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta\mathbf{x} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]

\[ \mathbf{\Sigma} = \sigma^2 \mathbf{I} + s_A^2 \mathbf{Z}_1^T \mathbf{Z}_1 + s_G^2 \mathbf{Z}_2^T \mathbf{Z}_2 = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \]

\[ \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 + s_G^2 x_1^2 & s_A^2 + s_G^2 x_2x_1 & \cdots & s_A^2 + s_G^2 x_nx_1\\ s_A^2 + s_G^2 x_1x_2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2 + s_G^2 x_{n-1}x_n\\ s_A^2 + s_G^2 x_1x_n & \cdots & s_A^2 + s_G^2 x_nx_{n-1} & \sigma^2 + s_A^2 + s_G^2 x_nx_n\\ \end{pmatrix} \]

Sleep

fit_lmer <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
fit_lmer
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## REML criterion at convergence: 1743.628
## Random effects:
##  Groups   Name        Std.Dev. Corr
##  Subject  (Intercept) 24.741       
##           Days         5.922   0.07
##  Residual             25.592       
## Number of obs: 180, groups:  Subject, 18
## Fixed Effects:
## (Intercept)         Days  
##      251.41        10.47

Mixed Model - General Case

Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]

\[~\]

Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]

\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.

\[~\] \(\to\) General fit and tests more difficult

Mixed Model - Longitudinal

Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]

  • Animal \(i\)
  • Food \(k\)
  • Time \(t\)

\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) Constant covariance between time measures.

Mixed Model - Longitudinal

Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]

  • Animal \(i\)
  • Food \(k\)
  • Time \(t\)

\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho^{|t-t'|} & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) AR(1) (with \(|\rho| < 1\))

Mixed Model - Spatial

Spatial Measures \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

\[ \mathbb{C}[y_{i}, y_{i'}] = \begin{cases} \gamma^2 e^{-d(i,i')/\rho} & \text{if } i \neq i' \\ \sigma^2 + \gamma^2 & \text{if } i = i' \end{cases} \]

  • \(d(i,i')\): distance between locations \(i\) and \(i'\)
  • \(\rho\): range (larger implies more correlations)
  • \(\gamma^2\) spatial variance
  • \(\sigma^2\) nugget

Mixed Model - General Case

Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \] with \(\mathbf{\Sigma}(\psi)\) depends on paramaters \(\boldsymbol{\psi}\)

Mixed Model - Simple Case

\[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]

Special case: \[ \mathbf{\Sigma} = \sigma^2 \mathbf{C} \] with \(\mathbf{C}\) known.

Then: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{y} \]

\[ \hat{\sigma}^2 = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2_{\mathbf{C}^{-1}}}{n-p} = \frac{(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T \mathbf{C}^{-1}(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})}{n-p} \]

Proof: Use Cholesky decomposition.

Mixed Model - Maximum Likelihood

Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]

Maximum-Likelihood Estimators: \[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) \]

\[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right\} \]

\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]

Mixed Model - Maximum Likelihood

\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]

\[ \nabla S(\hat{\boldsymbol{\beta}}) = 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X}\hat{\boldsymbol{\beta}} - 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} = 0 \]

\[ \hat{\boldsymbol{\beta}} = h(\boldsymbol{\psi}) = (\mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X})^{-1} \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} \]

Maximisation:

\[ \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi}))^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi})) \right\} \]

\[ \hat{\boldsymbol{\beta}} = h(\hat{\boldsymbol{\psi}}) \]

\(\to\) Optimization in \(\boldsymbol{\psi}\) can be complex.

Mixed Model - Restricted Likelihood

Linear Model \[ \hat{\sigma}^2_{ML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n} \qquad \text{vs} \qquad \hat{\sigma}^2_{REML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n-p} \] \[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \log L(0, \sigma^2 | \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \]

\[ \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y} = \hat{\boldsymbol{\epsilon}} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{P}^{\mathbf{X}^\bot}) \]

\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ \log |\sigma^2 \mathbf{P}^{\mathbf{X}^\bot}| + (\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y})^T (\sigma^2 \mathbf{P}^{\mathbf{X}^\bot})^{-1}(\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \right\} \]

\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ (n-p) \log(\sigma^2) + \frac{1}{\sigma^2} \|\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}\|^2 \right\} \]

Mixed Model - Restricted Likelihood

Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]

Let \(\mathbf{T}\) a \((n-p)\times n\) such that \(\mathbf{T}\mathbf{X}=\mathbf{0}\). \[ \tilde{\mathbf{y}} = \mathbf{T}\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{T}^T\mathbf{\Sigma}(\boldsymbol{\psi})\mathbf{T}) \]

\[ \hat{\boldsymbol{\psi}}_{REML} = \underset{\boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\psi} | \tilde{\mathbf{y}}) \]

\(\to\) Independent from \(\boldsymbol{\beta}\).

\(\to\) Independent from the choice of \(\mathbf{T}\).

\(\to\) Often, \(\mathbf{T} = \mathbf{P}^{\mathbf{X}^\bot}\).