10/04/2024
Penguins from the Palmer Archipelago in Antartica, see palmerpenguins.
library(palmerpenguins) kable(head(penguins))
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
We keep only the penguins of species “Adelie”, and remove missing data.
adelie <- subset(penguins, species == "Adelie") adelie <- na.omit(adelie) kable(head(adelie))
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
Adelie | Torgersen | 38.9 | 17.8 | 181 | 3625 | female | 2007 |
Question: Are male adelie penguins heavier than female adelie penguins ?
par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1) plot(adelie$body_mass_g, col = adelie$sex) ## scatter plot boxplot(body_mass_g ~ sex, data = adelie) ## box plot
mass_females <- subset(adelie, sex == "female")$body_mass_g mass_males <- subset(adelie, sex == "male")$body_mass_g t.test(mass_males, mass_females, var.equal = TRUE)
## ## Two Sample t-test ## ## data: mass_males and mass_females ## t = 13.126, df = 144, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 573.0666 776.2484 ## sample estimates: ## mean of x mean of y ## 4043.493 3368.836
Data:
Assumption:
Test:
\(\mathcal{H}_0\): \(\mu_f = \mu_m\) vs \(\mathcal{H}_1\): \(\mu_f \neq \mu_m\)
Test:
\(\mathcal{H}_0\): \(\mu_f = \mu_m\) vs \(\mathcal{H}_1\): \(\mu_f \neq \mu_m\)
Test Statistic:
\[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \] with the “pooled” standard deviation: \[ s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]
t.test(mass_males, mass_females, var.equal = TRUE)
## ## Two Sample t-test ## ## data: mass_males and mass_females ## t = 13.126, df = 144, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 573.0666 776.2484 ## sample estimates: ## mean of x mean of y ## 4043.493 3368.836
Reject the null that \(\mu_f = \mu_m\) : males and females have different body masses.
Assumptions:
in other words:
Can we write this as a linear model: “\(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}\)” ?
Write: \[ \mathbf{y} = \begin{pmatrix} \mathbf{y}^f \\ \mathbf{y}^m \\ \end{pmatrix} \quad \text{and} \quad \boldsymbol{\epsilon} = \begin{pmatrix} \boldsymbol{\epsilon}^f \\ \boldsymbol{\epsilon}^m \\ \end{pmatrix} \] \(\mathbf{y}\) and \(\boldsymbol{\epsilon}\) vectors of size \(n = n_f + n_m\).
Then, for \(1 \leq i \leq n\): \[ y_i = \mu_k + \epsilon_i \quad \text{with} \quad \begin{cases} \mu_k = \mu_f & \text{if } 1 \leq i \leq n_f\\ \mu_k = \mu_m & \text{if } n_f+1 \leq i \leq n_f + n_m \end{cases} \]
Regression matrix \(\mathbf{X}\) ?
\[ \text{Write:} \qquad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 0 & 1 \\ \vdots & \vdots\\ 0 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m \\ \end{pmatrix} \]
\[ \text{Then:} \qquad \mathbf{X}\boldsymbol{\beta} = \begin{pmatrix} \mu_f + 0\\ \vdots \\ \mu_f + 0\\ 0 + \mu_m\\ \vdots\\ 0 + \mu_m \\ \end{pmatrix} = \begin{pmatrix} \mu_f \\ \vdots \\ \mu_f \\ \mu_m \\ \vdots\\ \mu_m \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \]
Hence, the model can be written as: \[ \begin{pmatrix} y^f_1 \\ \vdots\\ y^f_{n_f} \\ y^m_{1} \\ \vdots\\ y^m_{n_m} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 0 & 1 \\ \vdots & \vdots\\ 0 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_f \\ \mu_m \\ \end{pmatrix} + \begin{pmatrix} \epsilon^f_1 \\ \vdots\\ \epsilon^f_{n_f} \\ \epsilon^m_{1} \\ \vdots\\ \epsilon^m_{n_m} \\ \end{pmatrix} \]
i.e: \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]
\(\to\) classical linear model with Gaussian assumption !
\[ \text{Write:} \qquad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \]
\[ \text{Then:} \qquad \mathbf{X}\boldsymbol{\beta} = \begin{pmatrix} \mu_f + 0\\ \vdots \\ \mu_f + 0\\ \mu_f + \mu_m - \mu_f\\ \vdots\\ \mu_f + \mu_m - \mu_f \\ \end{pmatrix} = \begin{pmatrix} \mu_f \\ \vdots \\ \mu_f \\ \mu_m \\ \vdots\\ \mu_m \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \]
The model can also be written as: \[ \begin{pmatrix} y^f_1 \\ \vdots\\ y^f_{n_f} \\ y^m_{1} \\ \vdots\\ y^m_{n_m} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} + \begin{pmatrix} \epsilon^f_1 \\ \vdots\\ \epsilon^f_{n_f} \\ \epsilon^m_{1} \\ \vdots\\ \epsilon^m_{n_m} \\ \end{pmatrix} \]
i.e: \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]
\(\to\) parametrization used in R
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]
As \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix}, \]
the test \[ \mathcal{H}_0: \mu_f = \mu_m \quad \text{vs} \quad \mathcal{H}_1: \mu_f \neq \mu_m \]
becomes: \[ \mathcal{H}_0: \beta_2 = 0 \quad \text{vs} \quad \mathcal{H}_1: \beta_2 \neq 0 \]
\(\to\) \(t\)-test on the \(\beta_2\) coefficient !
fit <- lm(body_mass_g ~ sex, data = adelie)
model.matrix(fit)[1:10, ]
## (Intercept) sexmale ## 1 1 1 ## 2 1 0 ## 3 1 0 ## 4 1 0 ## 5 1 1 ## 6 1 0 ## 7 1 1 ## 8 1 0 ## 9 1 1 ## 10 1 1
summary(fit)
## ## Call: ## lm(formula = body_mass_g ~ sex, data = adelie) ## ## Residuals: ## Min 1Q Median 3Q Max ## -718.49 -218.84 -18.84 225.00 731.51 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3368.84 36.34 92.69 <2e-16 *** ## sexmale 674.66 51.40 13.13 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 310.5 on 144 degrees of freedom ## Multiple R-squared: 0.5447, Adjusted R-squared: 0.5416 ## F-statistic: 172.3 on 1 and 144 DF, p-value: < 2.2e-16
t.test(mass_males, mass_females, var.equal = TRUE)
## ## Two Sample t-test ## ## data: mass_males and mass_females ## t = 13.126, df = 144, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 573.0666 776.2484 ## sample estimates: ## mean of x mean of y ## 4043.493 3368.836
Simple \(t\) test: \[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \quad \text{with} \quad s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]
Linear Regression: \[ \frac{\hat{\beta}_k - \beta_k}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{kk}}} \sim \mathcal{T}_{n-p} \] with \[ p = 2 \qquad n = n_f + n_m \qquad \beta_2 = 0 \text{ under } \mathcal{H}_0 \]
i.e. \[ \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \underset{\mathcal{H}_0}{\sim} \mathcal{T}_{n_f + n_m - 2} \]
Let’s show that: \[ \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \quad \text{with} \quad \mathbf{X} = \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} \begin{matrix} \\ n_f\\ \\ \\ n_m\\ \\ \end{matrix} \qquad \text{and} \qquad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \]
is equal to \[ \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \quad \text{with} \quad s_p = \sqrt{\frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2}} \]
\[ \mathbf{X}^T\mathbf{X} = \begin{pmatrix} 1 & \cdots & 1 & 1 & \cdots & 1 \\ 0 & \cdots & 0 & 1 & \cdots & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 0 \\ \vdots & \vdots \\ 1 & 0 \\ 1 & 1 \\ \vdots & \vdots\\ 1 & 1 \\ \end{pmatrix} = \begin{pmatrix} n & n_m \\ n_m & n_m \end{pmatrix} \]
Hence: \[ (\mathbf{X}^T\mathbf{X})^{-1} = \begin{pmatrix} n & n_m \\ n_m & n_m \end{pmatrix}^{-1} = \frac{1}{nn_m - n_m^2} \begin{pmatrix} n_m & -n_m \\ -n_m & n \end{pmatrix} \]
And: \[ [(\mathbf{X}^T\mathbf{X})^{-1}]_{22} = \frac{n}{nn_m - n_m^2} = \frac{n_f + n_m}{(n - n_m)n_m} = \frac{n_f + n_m}{n_f n_m} = \frac{1}{n_f} + \frac{1}{n_m} \]
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \]
But: \[ \mathbf{X}^{T}\mathbf{y} = \begin{pmatrix} 1 & \cdots & 1 & 1 & \cdots & 1 \\ 0 & \cdots & 0 & 1 & \cdots & 1 \\ \end{pmatrix} \mathbf{y} = \begin{pmatrix} \sum_{i = 1}^n y_i\\ \sum_{j = 1}^{n_m} y^m_j \end{pmatrix} \]
Hence: \[ \begin{aligned} \hat{\boldsymbol{\beta}} & = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} = \frac{1}{n_fn_m} \begin{pmatrix} n_m & -n_m \\ -n_m & n \end{pmatrix} \begin{pmatrix} \sum_{i = 1}^n y_i\\ \sum_{j = 1}^{n_m} y^m_j \end{pmatrix} \\ &= %\frac{1}{n_fn_m} %\begin{pmatrix} %n_m\sum_{i = 1}^n y_i - n_m\sum_{j = 1}^{n_m} y^m_j\\ %- n_m\sum_{i = 1}^{n} y_i + n\sum_{j = 1}^{n_m} y^m_j\\ %\end{pmatrix} %= \frac{1}{n_fn_m} \begin{pmatrix} n_m\sum_{i = 1}^n y_i - n_m\sum_{j = 1}^{n_m} y^m_j\\ - n_m\sum_{i = 1}^{n} y_i + n_m\sum_{j = 1}^{n_m} y^m_j + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} \\ &= \frac{1}{n_fn_m} \begin{pmatrix} n_m \sum_{i = 1}^{n_f} y^f_i\\ - n_m\sum_{i = 1}^{n_f} y^f_i + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} \end{aligned} \]
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \]
And: \[ \hat{\boldsymbol{\beta}} = \frac{1}{n_fn_m} \begin{pmatrix} n_m \sum_{i = 1}^{n_f} y^f_i\\ - n_m\sum_{i = 1}^{n_f} y^f_i + n_f\sum_{j = 1}^{n_m} y^m_j\\ \end{pmatrix} = \begin{pmatrix} \bar{\mathbf{y}}^f\\ -\bar{\mathbf{y}}^f + \bar{\mathbf{y}}^m\\ \end{pmatrix} \]
Note: \[ \text{since } \quad \boldsymbol{\beta} = \begin{pmatrix} \mu_f \\ \mu_m - \mu_f \\ \end{pmatrix} \quad \text{this result is coherent.} \]
Note: \[ \text{As} \quad \mathbf{x}^{i} = \begin{cases} (1~0) & \text{ if $i$ female} \\ (1~1) & \text{ if $i$ male} \\ \end{cases} \quad \text{we get:} \quad \hat{y}_i = \mathbf{x}^{i}\hat{\boldsymbol{\beta}} = \begin{cases} \bar{\mathbf{y}}^f & \text{ if $i$ female} \\ \bar{\mathbf{y}}^m & \text{ if $i$ male} \\ \end{cases} \]
\[ \hat{\sigma}^2 = \frac{1}{n - 2}\sum_{i = 1}^n (y_i - \hat{y}_i)^2 \]
Hence: \[ \hat{\sigma}^2 = \frac{1}{n_f + n_m - 2} \left[ \sum_{i = 1}^{n_f} (y^f_i - \bar{\mathbf{y}}^f)^2 + \sum_{i = 1}^{n_m} (y^m_i - \bar{\mathbf{y}}^m)^2 \right] \]
Recall: \[ s^2_p = \frac{(n_f-1)s^2_{\mathbf{y}^f} + (n_m-1)s^2_{\mathbf{y}^m}}{n_f + n_m - 2} \]
hence: \[ \hat{\sigma}^2 = s^2_p. \]
\[ T = \frac{\hat{\beta}_2}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{22}}} \]
but: \[ \hat{\beta}_2 = \bar{\mathbf{y}}^m -\bar{\mathbf{y}}^f \\ \hat{\sigma}^2 = s^2_p \\ [(\mathbf{X}^T\mathbf{X})^{-1}]_{22} = \frac{1}{n_f} + \frac{1}{n_m} \]
hence: \[ T = \frac{\bar{\mathbf{y}}^f - \bar{\mathbf{y}}^m}{s_p \sqrt{\frac{1}{n_f} + \frac{1}{n_m}}} \]
And the simple \(t\)-test is the same as the \(t\)-test on the coefficient of the linear regression.
Group comparison:
is the same as a Linear model with:
\(\to\) what happens when there are more than two groups ?
We keep only the females penguins of any species, and remove missing data.
females <- subset(penguins, sex == "female") females <- na.omit(females) kable(females[c(1, 2, 101, 102, 151, 152), ])
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
Gentoo | Biscoe | 43.8 | 13.9 | 208 | 4300 | female | 2008 |
Gentoo | Biscoe | 43.2 | 14.5 | 208 | 4450 | female | 2008 |
Chinstrap | Dream | 46.9 | 16.6 | 192 | 2700 | female | 2008 |
Chinstrap | Dream | 46.2 | 17.5 | 187 | 3650 | female | 2008 |
Question: are the females of each species of different body mass ?
par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1) plot(females$body_mass_g, col = females$species) ## scatter plot boxplot(body_mass_g ~ species, data = females) ## box plot
Pairwise \(t\) tests:
pairwise.t.test(females$body_mass_g, females$species, p.adjust.method = "none")
## ## Pairwise comparisons using t tests with pooled SD ## ## data: females$body_mass_g and females$species ## ## Adelie Chinstrap ## Chinstrap 0.0066 - ## Gentoo <2e-16 <2e-16 ## ## P value adjustment method: none
\(\to\) \({K \choose 2}\) tests : Multiple testing problems.
Model:
Pairwise Tests:
\(\mathcal{H}^{s_1,s_2}_0\): \(\mu_{s_1} = \mu_{s_2}\) vs \(\mathcal{H}^{s_1,s_2}_1\): \(\mu_{s_1} \neq \mu_{s_2}\)
for any pair \((s_1,s_2)\) of species.
Global Tests can we do:
\(\mathcal{H}_0\): \(\mu_a = \mu_c = \mu_g\) vs \(\mathcal{H}_1\): \(\exists s_1, s_2 \in S ~|~ \mu_{s_1} \neq \mu_{s_2}\) ?
\[ \begin{pmatrix} y^1_1 \\ \vdots\\ y^1_{n_1} \\ y^2_{1} \\ \vdots\\ y^2_{n_2} \\ \vdots\\ y^K_{1} \\ \vdots\\ y^K_{n_K} \\ \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 0\\ 1 & 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 1\\ \vdots & \vdots & \vdots & \cdots & \vdots\\ 1 & 0 & 0 & \cdots & 1\\ \end{pmatrix} \begin{pmatrix} \mu_1 \\ \mu_2 - \mu_1 \\ \vdots\\ \mu_K - \mu_1 \\ \end{pmatrix} + \begin{pmatrix} \epsilon^1_1 \\ \vdots\\ \epsilon^1_{n_1} \\ \epsilon^2_{1} \\ \vdots\\ \epsilon^2_{n_2} \\ \vdots\\ \epsilon^K_{1} \\ \vdots\\ \epsilon^K_{n_K} \\ \end{pmatrix} \]
Model:
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \text{with} \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) \]
Note: Assuming that the groups are disjoint, \(\text{rg}(X) = K\).
This is equivalent to the following model: \[ \begin{aligned} y_i &= \beta_1 + \epsilon_i = \mu_1 + \epsilon_i & \text{ if } &i \text{ is in group } 1\\ y_i &= \beta_1 + \beta_k + \epsilon_i = \mu_k + \epsilon_i & \text{ if } &i \text{ is in group } k,\ k>1 \\ \end{aligned} \]
i.e. each group has a mean \(\mu_k\), and the errors are Gaussian iid.
fit <- lm(body_mass_g ~ species, data = females) model.matrix(fit)[c(1, 2, 101, 102, 151, 152), ]
## (Intercept) speciesChinstrap speciesGentoo ## 1 1 0 0 ## 2 1 0 0 ## 101 1 0 1 ## 102 1 0 1 ## 151 1 1 0 ## 152 1 1 0
summary(fit)
## ## Call: ## lm(formula = body_mass_g ~ species, data = females) ## ## Residuals: ## Min 1Q Median 3Q Max ## -827.21 -193.84 20.26 181.16 622.79 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3368.84 32.42 103.908 < 2e-16 *** ## speciesChinstrap 158.37 57.52 2.754 0.00657 ** ## speciesGentoo 1310.91 48.72 26.904 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 277 on 162 degrees of freedom ## Multiple R-squared: 0.8292, Adjusted R-squared: 0.8271 ## F-statistic: 393.2 on 2 and 162 DF, p-value: < 2.2e-16
It is easy to show that:
\[ \hat{\beta}_1 = \bar{\mathbf{y}}^{s_1}\\ \hat{\beta}_k = \bar{\mathbf{y}}^{s_k} - \bar{\mathbf{y}}^{s_1} \text{ for } 2 \leq k \leq K\\ \] Hence:
We have: \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_{s_1}\\ \mu_{s_2} - \mu_{s_1}\\ \vdots\\ \mu_{s_K} - \mu_{s_1}\\ \end{pmatrix} \quad \text{and} \quad \hat{\boldsymbol{\beta}} = \begin{pmatrix} \bar{\mathbf{y}}^{s_1}\\ \bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\ \vdots\\ \bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\ \end{pmatrix} \]
\[ \mathcal{H}^{k}_0: \beta_{k} = 0 \quad{ vs } \quad \mathcal{H}^{k}_1: \beta_{k} \neq 0 \]
is the same as the test of the difference between groups \(1\) & \(k\):
\[ \mathcal{H}^{1,k}_0: \mu_{1} = \mu_{k} \quad{ vs } \quad \mathcal{H}^{1,k}_1: \mu_{1} \neq \mu_{k} \]
We have: \[ \boldsymbol{\beta} = \begin{pmatrix} \mu_{s_1}\\ \mu_{s_2} - \mu_{s_1}\\ \vdots\\ \mu_{s_K} - \mu_{s_1}\\ \end{pmatrix} \quad \text{and} \quad \hat{\boldsymbol{\beta}} = \begin{pmatrix} \bar{\mathbf{y}}^{s_1}\\ \bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\ \vdots\\ \bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\ \end{pmatrix} \]
\[ \mathcal{H}_0: \beta_{2} = \beta_3 = \cdots = \beta_{K} = 0 \quad{ vs } \quad \mathcal{H}_1: \exists k | \beta_{k} \neq 0 \]
is the same as the global test of mean equality of all groups:
\[ \mathcal{H}_0: \mu_{1} = \mu_2 = \cdots = \mu_{K} \quad{ vs } \quad \mathcal{H}_1: \exists k, k' | \mu_{k} \neq \mu_{k'} \]
fit <- lm(body_mass_g ~ species, data = females) summary(fit)
## ## Call: ## lm(formula = body_mass_g ~ species, data = females) ## ## Residuals: ## Min 1Q Median 3Q Max ## -827.21 -193.84 20.26 181.16 622.79 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3368.84 32.42 103.908 < 2e-16 *** ## speciesChinstrap 158.37 57.52 2.754 0.00657 ** ## speciesGentoo 1310.91 48.72 26.904 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 277 on 162 degrees of freedom ## Multiple R-squared: 0.8292, Adjusted R-squared: 0.8271 ## F-statistic: 393.2 on 2 and 162 DF, p-value: < 2.2e-16
In general, we might write for observation \(i\) of group \(k\): \[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
The model has \(K+1\) parameters.
The associated regression matrix would be: \[ \mathbf{X} = (\mathbb{1}, \mathbb{1}(S_1), \dotsc, \mathbb{1}(S_K)) \]
\(\mathbf{X}\) is not of full rank: \(\mathbb{1} = \sum_{k=1}^K \mathbb{1}(S_k)\).
\[ \mathbf{X} = \begin{pmatrix} 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \\ \end{pmatrix} \]
We need some constraint on the parameters.
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
In the previous studies, we assumed \[\beta_1 = 0\]
\(\to\) This is the constraint used by default in R
.
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
We can assume \[\mu = 0\]
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
We can assume \[ \sum_{k = 1}^K \beta_k= 0 \qquad \text{i.e.} \qquad \beta_K = - \sum_{k = 1}^{K-1} \beta_k \]
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
We can assume \[ \sum_{k = 1}^K n_k \beta_k= 0 \qquad \text{i.e.} \qquad \beta_K = -\frac{1}{n_K} \sum_{k = 1}^{K-1} n_k \beta_k \]
\[ y_{i,k} = \mu + \beta_k + \epsilon_i \]
ANOVA Test: Has the grouping any effect at all ? I.e. \[ \begin{aligned} \mathcal{H}_0&: \beta_{1} = \cdots = \beta_K = 0 &\text{vs}\\ \mathcal{H}_1&: \exists~k\in \{1, \dotsc, K\} ~|~ \beta_k \neq 0 \end{aligned} \]
No matter which set of constraint we choose, the \(F\) statistic associated with the one factor ANOVA is:
\[ F = \frac{ \|\hat{\mathbf{y}} - \bar{y}\mathbb{1}\|^2 / (K - 1) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - K) } \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - K}. \]
\(\to\) Global \(F\) test on the coefficients of the linear regression.
ANOVA = “ANalyse Of VAriance”
With R
parametrization: \[
\boldsymbol{\beta} =
\begin{pmatrix}
\mu_{s_1}\\
\mu_{s_2} - \mu_{s_1}\\
\vdots\\
\mu_{s_K} - \mu_{s_1}\\
\end{pmatrix}
\quad
\text{and}
\quad
\hat{\boldsymbol{\beta}} =
\begin{pmatrix}
\bar{\mathbf{y}}^{s_1}\\
\bar{\mathbf{y}}^{s_2} - \bar{\mathbf{y}}^{s_1}\\
\vdots\\
\bar{\mathbf{y}}^{s_K} - \bar{\mathbf{y}}^{s_1}\\
\end{pmatrix}
\]
\[ \mathcal{H}_0: \beta_{2} = \beta_3 = \cdots = \beta_{K} = 0 \quad{ vs } \quad \mathcal{H}_1: \exists k | \beta_{k} \neq 0 \]
and is the same as the global test of mean equality of all groups:
\[ \mathcal{H}_0: \mu_{1} = \mu_2 = \cdots = \mu_{K} \quad{ vs } \quad \mathcal{H}_1: \exists k, k' | \mu_{k} \neq \mu_{k'} \]
fit <- lm(body_mass_g ~ species, data = females) summary(fit)
## ## Call: ## lm(formula = body_mass_g ~ species, data = females) ## ## Residuals: ## Min 1Q Median 3Q Max ## -827.21 -193.84 20.26 181.16 622.79 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3368.84 32.42 103.908 < 2e-16 *** ## speciesChinstrap 158.37 57.52 2.754 0.00657 ** ## speciesGentoo 1310.91 48.72 26.904 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 277 on 162 degrees of freedom ## Multiple R-squared: 0.8292, Adjusted R-squared: 0.8271 ## F-statistic: 393.2 on 2 and 162 DF, p-value: < 2.2e-16
\[ F = \frac{ \|\hat{\mathbf{y}} - \bar{y}\mathbb{1}\|^2 / (K - 1) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - K) } = \frac{ ESS / (K - 1) }{ RSS / (n - K) } \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - K}. \]
“Anova table”:
Df | SS | Mean SS | F | \(p\) value | |
---|---|---|---|---|---|
factor | \(K - 1\) | \(ESS\) | \(ESS / (K - 1)\) | \(\frac{ESS / (K - 1)}{RSS / (n - K)}\) | \(p\) |
residuals | \(n - K\) | \(RSS\) | \(RSS / (n - K)\) | ||
total | \(n - 1\) | \(TSS\) |
fit <- lm(body_mass_g ~ species, data = females) anova(fit)
## Analysis of Variance Table ## ## Response: body_mass_g ## Df Sum Sq Mean Sq F value Pr(>F) ## species 2 60350016 30175008 393.25 < 2.2e-16 *** ## Residuals 162 12430757 76733 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We keep all the penguins of any species and any sex, and remove missing data.
penguins_nona <- na.omit(penguins) kable(penguins_nona[c(1, 2, 151, 152, 271, 272), ])
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Gentoo | Biscoe | 47.6 | 14.5 | 215 | 5400 | male | 2007 |
Gentoo | Biscoe | 46.5 | 13.5 | 210 | 4550 | female | 2007 |
Chinstrap | Dream | 45.2 | 17.8 | 198 | 3950 | female | 2007 |
Chinstrap | Dream | 46.1 | 18.2 | 178 | 3250 | female | 2007 |
Question: what is the effect of both species and sex on the body mass ?
par(mfrow = c(1, 2), mar = c(5, 4, 2, 2) + 0.1) plot(penguins_nona$body_mass_g, ## scatter plot pch = as.numeric(penguins_nona$species), ## point type = species col = penguins_nona$sex) ## color = sex boxplot(body_mass_g ~ sex + species, data = penguins_nona, ## box plot col = c("grey", "red"))
Observation \(i\) of group \(k\) in factor 1 and group \(l\) in factor 2 is written as: \[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]
The full model has regression matrix: \[ {\small \mathbf{X} = (\mathbb{1}, \mathbb{1}_{S_1}, \dotsc, \mathbb{1}_{S_K}, \mathbb{1}_{T_1}, \dotsc, \mathbb{1}_{T_L}, \mathbb{1}_{S_1, T_1}, \dotsc, \mathbb{1}_{S_K, T_L} ) } \]
\(\to\) \(1 + K + L + KL\) predictors. \(\to\) not identifiable !
\[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]
Assume, for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \alpha_1 = 0, \qquad \beta_1 = 0, \qquad \gamma_{1,l} = 0, \qquad \gamma_{k,1} = 0. \]
\(\to\) This is the constraint used by default in R
.
fit <- lm(body_mass_g ~ species * sex, data = penguins_nona) summary(fit)
## ## Call: ## lm(formula = body_mass_g ~ species * sex, data = penguins_nona) ## ## Residuals: ## Min 1Q Median 3Q Max ## -827.21 -213.97 11.03 206.51 861.03 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3368.84 36.21 93.030 < 2e-16 *** ## speciesChinstrap 158.37 64.24 2.465 0.01420 * ## speciesGentoo 1310.91 54.42 24.088 < 2e-16 *** ## sexmale 674.66 51.21 13.174 < 2e-16 *** ## speciesChinstrap:sexmale -262.89 90.85 -2.894 0.00406 ** ## speciesGentoo:sexmale 130.44 76.44 1.706 0.08886 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 309.4 on 327 degrees of freedom ## Multiple R-squared: 0.8546, Adjusted R-squared: 0.8524 ## F-statistic: 384.3 on 5 and 327 DF, p-value: < 2.2e-16
\[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \]
Interactions only: for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \mu = 0,\qquad \alpha_k = 0, \qquad \beta_l = 0. \]
Null sum: \[ \sum_{k=1}^K\alpha_k = 0, \qquad \sum_{l=1}^L\beta_l = 0, \qquad \sum_{k=1}^K\gamma_{kl} = 0, \qquad \sum_{l=1}^L\gamma_{kl} = 0. \]
\(\to\) changes the estimation of the coefficients, but not of the variance, nor the predicted values.
Recall that (from CM5), in general, to test: \[ \begin{aligned} \mathcal{H}_0&: \beta_{p_0 + 1} = \cdots = \beta_p = 0 &\text{vs}\\ \mathcal{H}_1&: \exists~k\in \{p_0+1, \dotsc, p\} ~|~ \beta_k \neq 0 \end{aligned} \]
We can use the statistic: \[ F = \frac{ \|\hat{\mathbf{y}} - \hat{\mathbf{y}}_0\|^2 / (p - p_0) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - p) } = \frac{n - p}{p - p_0}\frac{RSS_0 - RSS}{RSS} \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{p - p_0}_{n - p}. \]
Writing \(\mathbf{X}_{p_0} = (\mathbf{X}_1, \dotsc, \mathbf{X}_{p_0})\), this is the same as testing: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{p_0}}\mathbf{y} = \mathbf{P}^{\mathbf{X}}\mathbf{y} &\text{vs}\\ \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{p_0}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}}\mathbf{y} \end{aligned} \]
Take any two nested subsets of indexes \(\eta_1 \subset \eta_2\), and write: \[ \begin{aligned} \mathbf{X}_{\eta_k} &= (\mathbf{X}_i)_{i\in\eta_k} & \mathbf{P}^{\mathbf{X}_{\eta_k}}\mathbf{y} &= \hat{\mathbf{y}}_{\eta_k} & \mathbf{P}^{\mathbf{X}}\mathbf{y} &= \hat{\mathbf{y}} \end{aligned} \]
To test: \[ \begin{aligned} \mathcal{H}_0&: \text{"Larger model 2 does not bring any information compared to model 1"}\\ \mathcal{H}_1&: \text{"Larger model 2 does bring some information compared to model 1"} \end{aligned} \]
i.e. \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\eta_1}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\eta_2}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\eta_1}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\eta_2}}\mathbf{y} \end{aligned} \]
We can use: \[ \begin{aligned} F &= \frac{ \|\hat{\mathbf{y}}_{\eta_2} - \hat{\mathbf{y}}_{\eta_1}\|^2 / (|\eta_2| - |\eta_1|) }{ \|\mathbf{y} - \hat{\mathbf{y}}\|^2 / (n - p) }\\ &= \frac{ 1 }{ \hat{\sigma}^2 } (RSS_{\eta_1} - RSS_{\eta_2}) / (|\eta_2| - |\eta_1|) \underset{\mathcal{H}_0}{\sim} \mathcal{F}^{|\eta_2| - |\eta_1|}_{n - p}. \end{aligned} \]
Model: \[ y_{i,k,l} = \mu + \alpha_k + \beta_l + \gamma_{kl} + \epsilon_i \] with, for all \(1\leq k \leq K\) and \(1\leq l \leq L\): \[ \alpha_1 = 0, \qquad \beta_1 = 0, \qquad \gamma_{1,l} = 0, \qquad \gamma_{k,1} = 0. \] (\(\to KL\) free parameters in total.)
Tests:
To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “factor 1 does not bring any information compared to intercept alone”.
We can use: \[ \begin{aligned} F &= \frac{n - KL}{K - 1}\frac{RSS_{\mu} - RSS_{\mu, \alpha}}{RSS}\\ &= \frac{R(\alpha | \mu) / (K - 1)}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{K - 1}_{n - KL}. \end{aligned} \]
To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha, \beta}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “factor 2 does not bring any information compared to factor 1”.
We can use: \[ \begin{aligned} F &= \frac{n - KL}{L - 1}\frac{RSS_{\mu, \alpha} - RSS_{\mu, \alpha, \beta}}{RSS}\\ &= \frac{R(\beta | \mu, \alpha) / (L - 1)}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{L - 1}_{n - KL}. \end{aligned} \]
To test: \[ \begin{aligned} \mathcal{H}_0&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta}}\mathbf{y} = \mathbf{P}^{\mathbf{X}_{\mu,\alpha, \beta,\gamma}}\mathbf{y} &\text{v.s.}&& \mathcal{H}_1&: \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\gamma}}\mathbf{y} \neq \mathbf{P}^{\mathbf{X}_{\mu,\alpha,\beta,\gamma}}\mathbf{y} \end{aligned} \] i.e. \(\mathcal{H}_0\): “Interactions do not bring any information compared to factors 1 and 2”.
We can use: \[ \begin{aligned} F &= \frac{n - KL}{(L - 1)(K - 1)}\frac{RSS_{\mu, \alpha, \beta} - RSS_{\mu, \alpha, \beta, \gamma}}{RSS}\\ &= \frac{R(\gamma | \mu, \alpha, \beta) / ((K - 1)(L - 1))}{\hat{\sigma}^2}\\ &\underset{\mathcal{H}_0}{\sim} \mathcal{F}^{(K - 1)(L - 1)}_{n - KL}. \end{aligned} \]
“Type I” Anova table:
Df | SS | Mean SS | F | \(p\) value | |
---|---|---|---|---|---|
factor 1 | \(K - 1\) | \(R(\alpha|\mu)\) | \(R(\alpha|\mu) / (K - 1)\) | \(\hat{\sigma}^{-2} R(\alpha|\mu) / (K - 1)\) | \(p\) |
factor 2 | \(L - 1\) | \(R(\beta|\mu, \alpha)\) | \(R(\beta|\mu, \alpha) / (L - 1)\) | \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) | \(p\) |
interactions | \((K - 1)(L - 1)\) | \(R(\gamma|\mu, \alpha, \beta)\) | \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(p\) |
residuals | \(n - KL\) | \(RSS\) | \(\hat{\sigma}^2 = RSS / (n - KL)\) |
fit <- lm(body_mass_g ~ species * sex, data = penguins_nona) anova(fit)
## Analysis of Variance Table ## ## Response: body_mass_g ## Df Sum Sq Mean Sq F value Pr(>F) ## species 2 145190219 72595110 758.358 < 2.2e-16 *** ## sex 1 37090262 37090262 387.460 < 2.2e-16 *** ## species:sex 2 1676557 838278 8.757 0.0001973 *** ## Residuals 327 31302628 95727 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit <- lm(body_mass_g ~ sex * species, data = penguins_nona) anova(fit)
## Analysis of Variance Table ## ## Response: body_mass_g ## Df Sum Sq Mean Sq F value Pr(>F) ## sex 1 38878897 38878897 406.145 < 2.2e-16 *** ## species 2 143401584 71700792 749.016 < 2.2e-16 *** ## sex:species 2 1676557 838278 8.757 0.0001973 *** ## Residuals 327 31302628 95727 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
“Type I” Anova table:
Df | SS | Mean SS | F | \(p\) value | |
---|---|---|---|---|---|
factor 1 | \(K - 1\) | \(R(\alpha|\mu)\) | \(R(\alpha|\mu) / (K - 1)\) | \(\hat{\sigma}^{-2} R(\alpha|\mu) / (K - 1)\) | \(p\) |
factor 2 | \(L - 1\) | \(R(\beta|\mu, \alpha)\) | \(R(\beta|\mu, \alpha) / (L - 1)\) | \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) | \(p\) |
interactions | \((K - 1)(L - 1)\) | \(R(\gamma|\mu, \alpha, \beta)\) | \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(p\) |
residuals | \(n - KL\) | \(RSS\) | \(\hat{\sigma}^2 = RSS / (n - KL)\) |
Attention: The order of factors matters !
“Type II” Anova table:
Df | SS | Mean SS | F | \(p\) value | |
---|---|---|---|---|---|
factor 1 | \(K - 1\) | \(R(\alpha|\mu, \beta)\) | \(R(\alpha|\mu, \beta) / (K - 1)\) | \(\hat{\sigma}^{-2} R(\alpha|\mu, \beta) / (K - 1)\) | \(p\) |
factor 2 | \(L - 1\) | \(R(\beta|\mu, \alpha)\) | \(R(\beta|\mu, \alpha) / (L - 1)\) | \(\hat{\sigma}^{-2} R(\beta|\mu, \alpha) / (L - 1)\) | \(p\) |
interactions | \((K - 1)(L - 1)\) | \(R(\gamma|\mu, \alpha, \beta)\) | \(R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(\hat{\sigma}^{-2} R(\gamma|\mu, \alpha, \beta) / ((K-1)(L-1))\) | \(p\) |
residuals | \(n - KL\) | \(RSS\) | \(\hat{\sigma}^2 = RSS / (n - KL)\) |
Symmetric in factor 1 and 2.
fit <- lm(body_mass_g ~ sex * species, data = penguins_nona) car::Anova(fit)
## Anova Table (Type II tests) ## ## Response: body_mass_g ## Sum Sq Df F value Pr(>F) ## sex 37090262 1 387.460 < 2.2e-16 *** ## species 143401584 2 749.016 < 2.2e-16 *** ## sex:species 1676557 2 8.757 0.0001973 *** ## Residuals 31302628 327 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit <- lm(bill_length_mm ~ sex * species, data = penguins_nona) car::Anova(fit)
## Anova Table (Type II tests) ## ## Response: bill_length_mm ## Sum Sq Df F value Pr(>F) ## sex 1135.7 1 211.8066 <2e-16 *** ## species 6975.6 2 650.4786 <2e-16 *** ## sex:species 24.5 2 2.2841 0.1035 ## Residuals 1753.3 327 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit <- lm(bill_length_mm ~ sex + species, data = penguins_nona) car::Anova(fit)
## Anova Table (Type II tests) ## ## Response: bill_length_mm ## Sum Sq Df F value Pr(>F) ## sex 1135.7 1 210.17 < 2.2e-16 *** ## species 6975.6 2 645.44 < 2.2e-16 *** ## Residuals 1777.8 329 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Question: what is the relationship between bill length and depth, controlling for the species ?
fit_all <- lm(bill_length_mm ~ bill_depth_mm, data = females) summary(fit_all)
## ## Call: ## lm(formula = bill_length_mm ~ bill_depth_mm, data = females) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.0745 -3.0689 -0.7609 2.6776 17.5034 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 61.2214 3.1966 19.152 < 2e-16 *** ## bill_depth_mm -1.1643 0.1935 -6.018 1.13e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.449 on 163 degrees of freedom ## Multiple R-squared: 0.1818, Adjusted R-squared: 0.1768 ## F-statistic: 36.22 on 1 and 163 DF, p-value: 1.128e-08
\[ y_{i} = \mu + \beta x_{i} + \epsilon_i \]
fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females) summary(fit_species)
## ## Call: ## lm(formula = bill_length_mm ~ bill_depth_mm + species, data = females) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.9886 -1.4503 -0.0603 1.4476 11.2797 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 25.0447 3.9377 6.360 1.99e-09 *** ## bill_depth_mm 0.6930 0.2230 3.108 0.00222 ** ## speciesChinstrap 9.3393 0.4648 20.092 < 2e-16 *** ## speciesGentoo 10.6515 0.8510 12.516 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.238 on 161 degrees of freedom ## Multiple R-squared: 0.7954, Adjusted R-squared: 0.7916 ## F-statistic: 208.7 on 3 and 161 DF, p-value: < 2.2e-16
\[ y_{i,k} = \mu + \alpha_k + \beta x_{i} + \epsilon_i \]
fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females) car::Anova(fit_species)
## Anova Table (Type II tests) ## ## Response: bill_length_mm ## Sum Sq Df F value Pr(>F) ## bill_depth_mm 48.41 1 9.6624 0.002224 ** ## species 2419.64 2 241.4533 < 2.2e-16 *** ## Residuals 806.70 161 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit_int <- lm(bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species, data = females) fit_int
## ## Call: ## lm(formula = bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species, ## data = females) ## ## Coefficients: ## (Intercept) bill_depth_mm ## 31.1671 0.3456 ## speciesChinstrap speciesGentoo ## -2.5348 -8.8729 ## bill_depth_mm:speciesChinstrap bill_depth_mm:speciesGentoo ## 0.6745 1.2887
\[ y_{i,k} = \mu + \alpha_k + \beta_k x_{i} + \epsilon_i \]
fit_int <- lm(bill_length_mm ~ bill_depth_mm + species + bill_depth_mm:species, data = females) car::Anova(fit_int)
## Anova Table (Type II tests) ## ## Response: bill_length_mm ## Sum Sq Df F value Pr(>F) ## bill_depth_mm 48.41 1 9.8428 0.002032 ** ## species 2419.64 2 245.9610 < 2.2e-16 *** ## bill_depth_mm:species 24.62 2 2.5029 0.085070 . ## Residuals 782.08 159 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For any observation \(1 \leq i \leq n\) in group \(1 \leq k \leq K\), we write: \[ y_{i,k} = \mu + \alpha_k + (\beta + \gamma_k) \times x_{i} + \epsilon_i \qquad \epsilon_i\sim\mathcal{N}(0, \sigma^2)~iid \]
Identifiability: \(\alpha_1 = \gamma_1 = 0\).
\[ \text{With $K=2$:}\qquad \mathbf{X} = \begin{pmatrix} 1 & 0 & x_1 & 0\\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & x_{n_1} & 0 &\\ 1 & 1 & x_{n_1 + 1} & x_{n_1 + 1} \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 1 & x_{n_1 + n_2} & x_{n_1 + n_2}\\ \end{pmatrix} \]
F tests of effects and interactions can be done as before
More than one predictor ? \(\to\) not a problem
More than one factor ? \(\to\) not a problem
Attention as model become more complex, they are more difficult to interpret and test.
Attention diagnostics on the residuals (CM6) still needed.
fit_species <- lm(bill_length_mm ~ bill_depth_mm + species, data = females) par(mfrow = c(2,2)) plot(fit_species)
\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
An Introduction to Statistical Learning
G. James, D. Witten, T. Hastie and R. Tibshirani
Le modèle linéaire et ses extensions
L. Bel, JJ. Daudin, M. Etienne, E. Lebarbier, T. Mary-Huard, S. Robin, C. Vuillet
The Elements of Statistical Learning
T. Hastie, R. Tibshirani, J. Friedman
Credit: Lorenzo De Stefani
library(lme4) kable(head(sleepstudy))
Reaction | Days | Subject |
---|---|---|
249.5600 | 0 | 308 |
258.7047 | 1 | 308 |
250.8006 | 2 | 308 |
321.4398 | 3 | 308 |
356.8519 | 4 | 308 |
414.6901 | 5 | 308 |
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + \alpha_k + (\beta + \gamma_k) \times x_{i} + \epsilon_{i, k} \]
Problem: What if there are many groups, with few observations?
\(\to\) Estimation problem.
fit_ancova <- lm(Reaction ~ Days * Subject, data = sleepstudy) summary(fit_ancova)
## ## Call: ## lm(formula = Reaction ~ Days * Subject, data = sleepstudy) ## ## Residuals: ## Min 1Q Median 3Q Max ## -106.397 -10.692 -0.177 11.417 132.510 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 244.193 15.042 16.234 < 2e-16 *** ## Days 21.765 2.818 7.725 1.74e-12 *** ## Subject309 -39.138 21.272 -1.840 0.067848 . ## Subject310 -40.708 21.272 -1.914 0.057643 . ## Subject330 45.492 21.272 2.139 0.034156 * ## Subject331 41.546 21.272 1.953 0.052749 . ## Subject332 20.059 21.272 0.943 0.347277 ## Subject333 30.826 21.272 1.449 0.149471 ## Subject334 -4.030 21.272 -0.189 0.850016 ## Subject335 18.842 21.272 0.886 0.377224 ## Subject337 45.911 21.272 2.158 0.032563 * ## Subject349 -29.081 21.272 -1.367 0.173728 ## Subject350 -18.358 21.272 -0.863 0.389568 ## Subject351 16.954 21.272 0.797 0.426751 ## Subject352 32.179 21.272 1.513 0.132535 ## Subject369 10.775 21.272 0.507 0.613243 ## Subject370 -33.744 21.272 -1.586 0.114870 ## Subject371 9.443 21.272 0.444 0.657759 ## Subject372 22.852 21.272 1.074 0.284497 ## Days:Subject309 -19.503 3.985 -4.895 2.61e-06 *** ## Days:Subject310 -15.650 3.985 -3.928 0.000133 *** ## Days:Subject330 -18.757 3.985 -4.707 5.84e-06 *** ## Days:Subject331 -16.499 3.985 -4.141 5.88e-05 *** ## Days:Subject332 -12.198 3.985 -3.061 0.002630 ** ## Days:Subject333 -12.623 3.985 -3.168 0.001876 ** ## Days:Subject334 -9.512 3.985 -2.387 0.018282 * ## Days:Subject335 -24.646 3.985 -6.185 6.07e-09 *** ## Days:Subject337 -2.739 3.985 -0.687 0.492986 ## Days:Subject349 -8.271 3.985 -2.076 0.039704 * ## Days:Subject350 -2.261 3.985 -0.567 0.571360 ## Days:Subject351 -15.331 3.985 -3.848 0.000179 *** ## Days:Subject352 -8.198 3.985 -2.057 0.041448 * ## Days:Subject369 -10.417 3.985 -2.614 0.009895 ** ## Days:Subject370 -3.709 3.985 -0.931 0.353560 ## Days:Subject371 -12.576 3.985 -3.156 0.001947 ** ## Days:Subject372 -10.467 3.985 -2.627 0.009554 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 25.59 on 144 degrees of freedom ## Multiple R-squared: 0.8339, Adjusted R-squared: 0.7936 ## F-statistic: 20.66 on 35 and 144 DF, p-value: < 2.2e-16
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[ \mathbb{E}[y_{i,k}] = \mu \]
\[ \mathbb{V}[y_{i,k}] = s_A^2 + \sigma^2 \]
\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 & \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]
ANOVA \[ y_{i,k} = \mu + \alpha_{k} + \epsilon_{ik} \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]
\[ \mathcal{H}_0: \alpha_1 = \cdots = \alpha_K = 0 \]
Mixed Model \[ y_{i,k} = \mu + A_{k} + \epsilon_{ik} \quad A_k \sim \mathcal{N}(0, s_A^2) ~ iid \quad \epsilon_{i,k} \sim \mathcal{N}(0, \sigma^2) ~ iid \]
\[ \mathcal{H}_0: s_A^2 = 0 \]
\[~\]
\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{Z}\mathbf{U} + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \text{vec}((y_{i,k})_{i, k}) = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z} = \begin{pmatrix} 1 & 0 & \cdots & 0\\ 1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U} &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]
\[ y_{i,k} = \mu + A_{k} + \epsilon_i \]
\[~\]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]
\[~\]
\[ \mathbf{\Sigma} = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \qquad \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 & s_A^2 & \cdots & s_A^2\\ s_A^2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2\\ s_A^2 & \cdots & s_A^2 & \sigma^2 + s_A^2\\ \end{pmatrix} \]
Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]
\[~\]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.
For observation \(i\) of group \(k\): \[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbb{E}[y_{i,k}] = \mu + \beta x_i \]
\[ \mathbb{V}[y_{i,k}] = s_A^2 + s_G^2x_i^2 + \sigma^2 \]
\[ \mathbb{C}[y_{i,k}, y_{i',k'}] = \begin{cases} s_A^2 + s_G^2 x_i x_{i'}& \text{if } k = k' \\ 0 & \text{otherwise.} \end{cases} \]
\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \begin{pmatrix} y_{1,1}\\ y_{2,1}\\ \vdots\\ y_{n, 1}\\ y_{1, 2}\\ \vdots\\ y_{n, 2}\\ \vdots\\ y_{n, K} \end{pmatrix} \qquad \mathbf{Z}_2 = \begin{pmatrix} x_1 & 0 & \cdots & 0\\ x_2 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ x_n & 0 & \cdots & 0\\ 0 & x_1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & x_2 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & x_n\\ \end{pmatrix} \qquad \begin{aligned} \mathbf{U}_1 &\sim \mathcal{N}(\mathbf{0}, s_A^2\mathbf{I}_{K}) \\ \mathbf{U}_2 &\sim \mathcal{N}(\mathbf{0}, s_G^2\mathbf{I}_{K}) \\ \boldsymbol{\epsilon} &\sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_{nK}) \end{aligned} \]
\[ y_{i,k} = \mu + A_{k} + (\beta + G_k) \times x_{i} + \epsilon_i \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta \mathbf{x} + \mathbf{Z}_1\mathbf{U}_1 + \mathbf{Z}_2\mathbf{U}_2 + \boldsymbol{\epsilon} \]
\[ \mathbf{y} = \mu\mathbf{1}_{nK} + \beta\mathbf{x} + \mathbf{E} \qquad \mathbf{E} \sim \mathcal{N}(\mathbf{0}, \mathbf{\Sigma}) \]
\[ \mathbf{\Sigma} = \sigma^2 \mathbf{I} + s_A^2 \mathbf{Z}_1^T \mathbf{Z}_1 + s_G^2 \mathbf{Z}_2^T \mathbf{Z}_2 = \begin{pmatrix} \mathbf{R} & \mathbf{0} & \cdots & \mathbf{0}\\ \mathbf{0} & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & \mathbf{0}\\ \mathbf{0} & \cdots & \mathbf{0} & \mathbf{R}\\ \end{pmatrix} \]
\[ \mathbf{R} = \begin{pmatrix} \sigma^2 + s_A^2 + s_G^2 x_1^2 & s_A^2 + s_G^2 x_2x_1 & \cdots & s_A^2 + s_G^2 x_nx_1\\ s_A^2 + s_G^2 x_1x_2 & \ddots & \ddots & \vdots\\ \vdots & \ddots & \ddots & s_A^2 + s_G^2 x_{n-1}x_n\\ s_A^2 + s_G^2 x_1x_n & \cdots & s_A^2 + s_G^2 x_nx_{n-1} & \sigma^2 + s_A^2 + s_G^2 x_nx_n\\ \end{pmatrix} \]
fit_lmer <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy) fit_lmer
## Linear mixed model fit by REML ['lmerMod'] ## Formula: Reaction ~ Days + (Days | Subject) ## Data: sleepstudy ## REML criterion at convergence: 1743.628 ## Random effects: ## Groups Name Std.Dev. Corr ## Subject (Intercept) 24.741 ## Days 5.922 0.07 ## Residual 25.592 ## Number of obs: 180, groups: Subject, 18 ## Fixed Effects: ## (Intercept) Days ## 251.41 10.47
Linear Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n) \]
\[~\]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
\[~\] with \(\mathbf{\Sigma}\) structured covariance matrix.
\[~\] \(\to\) General fit and tests more difficult
Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]
\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) Constant covariance between time measures.
Repeated Measures \[ y_{ikt} = \mu + \alpha_k + \beta_t + \gamma_{kt} + \epsilon_{ikt} \]
\[ \mathbb{C}[y_{i,k,t}, y_{i',k',t'}] = \begin{cases} \sigma^2 \rho^{|t-t'|} & \text{if } (i,k) = (i',k') \\ 0 & \text{otherwise.} \end{cases} \] \[~\] \(\to\) AR(1) (with \(|\rho| < 1\))
Spatial Measures \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]
\[ \mathbb{C}[y_{i}, y_{i'}] = \begin{cases} \gamma^2 e^{-d(i,i')/\rho} & \text{if } i \neq i' \\ \sigma^2 + \gamma^2 & \text{if } i = i' \end{cases} \]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \] with \(\mathbf{\Sigma}(\psi)\) depends on paramaters \(\boldsymbol{\psi}\)
\[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}) \]
Special case: \[ \mathbf{\Sigma} = \sigma^2 \mathbf{C} \] with \(\mathbf{C}\) known.
Then: \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{C}^{-1}\mathbf{y} \]
\[ \hat{\sigma}^2 = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2_{\mathbf{C}^{-1}}}{n-p} = \frac{(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})^T \mathbf{C}^{-1}(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}})}{n-p} \]
Proof: Use Cholesky decomposition.
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]
Maximum-Likelihood Estimators: \[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) \]
\[ \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\beta}, \boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right\} \]
\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]
\[ S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]
\[ \nabla S(\hat{\boldsymbol{\beta}}) = 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X}\hat{\boldsymbol{\beta}} - 2 \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} = 0 \]
\[ \hat{\boldsymbol{\beta}} = h(\boldsymbol{\psi}) = (\mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{X})^{-1} \mathbf{X}^T\mathbf{\Sigma}(\boldsymbol{\psi})^{-1}\mathbf{y} \]
Maximisation:
\[ \hat{\boldsymbol{\psi}} = \underset{\boldsymbol{\psi}}{\operatorname{argmin}} \left\{ \log |\mathbf{\Sigma}(\boldsymbol{\psi})| + (\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi}))^T \mathbf{\Sigma}(\boldsymbol{\psi})^{-1}(\mathbf{y} - \mathbf{X}h(\boldsymbol{\psi})) \right\} \]
\[ \hat{\boldsymbol{\beta}} = h(\hat{\boldsymbol{\psi}}) \]
\(\to\) Optimization in \(\boldsymbol{\psi}\) can be complex.
Linear Model \[ \hat{\sigma}^2_{ML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n} \qquad \text{vs} \qquad \hat{\sigma}^2_{REML} = \frac{\|\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \|^2}{n-p} \] \[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \log L(0, \sigma^2 | \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \]
\[ \mathbf{P}^{\mathbf{X}^\bot}\mathbf{y} = \hat{\boldsymbol{\epsilon}} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{P}^{\mathbf{X}^\bot}) \]
\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ \log |\sigma^2 \mathbf{P}^{\mathbf{X}^\bot}| + (\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y})^T (\sigma^2 \mathbf{P}^{\mathbf{X}^\bot})^{-1}(\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}) \right\} \]
\[ \hat{\sigma}^2_{REML} = \underset{\sigma^2}{\operatorname{argmax}} \left\{ (n-p) \log(\sigma^2) + \frac{1}{\sigma^2} \|\mathbf{P}^{\mathbf{X}^\bot}\mathbf{y}\|^2 \right\} \]
Mixed Model \[ \mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol{\beta}, \mathbf{\Sigma}(\boldsymbol{\psi})) \]
Let \(\mathbf{T}\) a \((n-p)\times n\) such that \(\mathbf{T}\mathbf{X}=\mathbf{0}\). \[ \tilde{\mathbf{y}} = \mathbf{T}\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{T}^T\mathbf{\Sigma}(\boldsymbol{\psi})\mathbf{T}) \]
\[ \hat{\boldsymbol{\psi}}_{REML} = \underset{\boldsymbol{\psi}}{\operatorname{argmax}} \log L(\boldsymbol{\psi} | \tilde{\mathbf{y}}) \]
\(\to\) Independent from \(\boldsymbol{\beta}\).
\(\to\) Independent from the choice of \(\mathbf{T}\).
\(\to\) Often, \(\mathbf{T} = \mathbf{P}^{\mathbf{X}^\bot}\).