24/01/2024
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \forall 1 \leq i \leq n\]
sales
)TV
)\[ (\hat{\beta}_0, \hat{\beta}_1) = \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmin}} \left\{ \sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \right\}\]
\[ \begin{aligned} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1\bar{x} & \hat{\beta}_1 &= \frac{s_{\mathbf{y},\mathbf{x}}^2}{s_{\mathbf{x}}^2} \\ \mathbb{E}[\hat{\beta}_0] &= \beta_0 & \mathbb{E}[\hat{\beta}_1] &= \beta_1 \\ \mathbb{Var}[\hat{\beta}_0] &= \frac{\sigma^2}{n} \left( 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} \right) & \mathbb{Var}[\hat{\beta}_1] &= \frac{\sigma^2}{n}\frac{1}{s_{\mathbf{x}}^2} \\ \mathbb{Cov}[\hat{\beta}_0; \hat{\beta}_1] &= - \frac{\sigma^2}{n} \frac{\bar{x}}{s_{\mathbf{x}}^2} \end{aligned} \]
Confidence interval for \((\hat{\beta}_0, \hat{\beta}_1)\) ?
Can we test \(\hat{\beta}_1 = 0\) (i.e. no linear trend) ?
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \forall 1 \leq i \leq n\]
sales
)TV
)\[ \epsilon_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(0, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
Let \(X\) be a Gaussian r.v. with expectation \(\mu\) and variance \(\sigma^2\): \[ X \sim \mathcal{N}(\mu, \sigma^2). \]
It admits a probability density function (pdf): \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) \quad \forall x \in \mathbb{R}, \]
And a characteristic function: \[ \phi_{X}(t) = \mathbb{E}[e^{itX}] = e^{i\mu t - \frac{\sigma^2 t^2}{2}} \quad \forall t \in \mathbb{R}. \]
If \(X \sim \mathcal{N}(\mu, \sigma^2)\), then for any \((a, b) \in \mathbb{R}^2\),
\[
Y = aX + b \sim \mathcal{N}(a\mu + b, a^2\sigma^2).
\]
Model: \[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \quad \text{with} \quad \epsilon_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(0, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
Distribution of \(y_i\): \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
If \(X\) is a rv that admits a density \(f_{\theta}\) that depends on a parameter \(\theta\), then the likelihood function is the density, seen as a function of \(\theta\) given an outcome \(X = x\): \[ \theta \mapsto L(\theta | x) = f_{\theta}(x). \]
Example:
Assume that we observe one realization \(x\) of a Gaussian random variable \(X\), with unknown expectation \(\mu\), but known variance \(\sigma^2 = 1\).
Then the Likelihood of the observation \(x_{obs}\) is the function: \[
\mu \to \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right).
\]
The Maximum Likelihood (ML) estimator \(\hat{\theta}\) of the parameter \(\theta\) is the one that maximizes the likelihood function in \(\theta\), given an observation \(x\):
\[
\hat{\theta} = \underset{\theta}{\operatorname{argmax}} L(\theta | x).
\] Example:
Let \(x_{obs}\) be one realization of a Gaussian r.v. \(X\), with unknown expectation \(\mu\), but known variance \(\sigma^2 = 1\).
Then the ML estimator \(\hat{\mu}\) of \(\mu\) is: \[
\hat{\mu} = \underset{\mu \in \mathbb{R}}{\operatorname{argmax}} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right).
\]
Distribution of \(y_i\): \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
Likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\):
\[ \begin{aligned} &L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\ \end{aligned} \]
and \[ \begin{aligned} L(\beta_0, \beta_1, \sigma^2 | y_i) = \cdots \end{aligned} \]
Distribution of \(y_i\): \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
Likelihood of \(\mathbf{y} = (y_1, \dotsc, y_n)^T\):
\[ \begin{aligned} &L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\ \end{aligned} \]
and \[ \begin{aligned} L(\beta_0, \beta_1, \sigma^2 | y_i) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2\sigma^2}\right) \end{aligned} \]
\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\\ \end{aligned} \]
\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\\ \end{aligned} \]
\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\underbrace{\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2}_{\text{Sum of Squares } SS(\beta_0, \beta_1)}\\ \end{aligned} \]
\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\underbrace{\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2}_{\text{Sum of Squares } SS(\beta_0, \beta_1)}\\ \end{aligned} \]
\[ (\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}) = (\hat{\beta}_0^{OLS}, \hat{\beta}_1^{OLS}) \]
\(\hookrightarrow\) The ML estimators are the same as the OLS estimators.
\[ \begin{aligned} \hat{\sigma}^2_{ML} &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}} \log L(\hat{\beta}_0, \hat{\beta}_1, \sigma^2 | \mathbf{y})\\ &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}} - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\\ &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmin}} \frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\\ \end{aligned} \]
\[ \frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right] = \cdots = 0 \]
\[ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right] &= \frac{n}{2} \frac{1}{\sigma^2} - \frac{1}{2\sigma^4}\sum_{i = 1}^n \hat{\epsilon}_i^2 = 0 \end{aligned} \] And we get:
\[ \hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i = 1}^n \hat{\epsilon}_i^2 \]
\[ \hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i = 1}^n \hat{\epsilon}_i^2 \neq \frac{1}{n-2}\sum_{i = 1}^n \hat{\epsilon}_i^2 = \hat{\sigma}^2 \]
\[ \begin{aligned} \mathbb{E}\left[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \right] &= \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} \\ \mathbb{Var}\left[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \right] &= \frac{\sigma^2}{n} \begin{pmatrix} 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\ -\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2} \end{pmatrix} = \sigma^2 \mathbf{V}_n \end{aligned} \]
When the variance \(\sigma^2\) is known, we find that the vector of estimators \((\hat{\beta}_0, \hat{\beta}_1)\) is a Gaussian vector, with expectation \((\beta_0, \beta_1)\) and variance \(\sigma^2 \mathbf{V}_n\):
\[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \]
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
\[ \hat{\beta}_0 \sim \mathcal{N}\left(\beta_0, \frac{\sigma^2}{n}\left(1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2}\right)\right) \quad \hat{\beta}_1 \sim \mathcal{N}\left(\beta_1, \frac{\sigma^2}{n}\frac{1}{s_{\mathbf{x}}^2}\right) \]
\[ \hat{\sigma}^2 = \frac{1}{n-2}\sum_{i = 1}^n \hat{\epsilon}_i^2 \qquad \mathbb{E}[\hat{\sigma}^2] = \sigma^2 \]
\[ \frac{n-2}{\sigma^2}\hat{\sigma}^2 \sim \chi^2_{n-2} \]
\(~\)
\(~\)
\(~\)
Let \(X_1, \dotsc, X_p\) be \(p\) standard normal iid rv : \(X_i \sim \mathcal{N}(0, 1)\). Then \[ X = \sum_{i = 1}^p X_i^2 \sim \chi^2_p \]
is a Chi squared r.v. with \(p\) degrees of freedom.
We have: \[ \mathbb{E}[X] = p \quad \mathbb{Var}[X] = 2p. \]
It converges in distribution towards the Gaussian with matching moments.
When \(\sigma^2\) is known:
\[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \qquad \mathbf{V}_n= \frac{1}{n} \begin{pmatrix} 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\ -\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2} \end{pmatrix} \]
i.e.
\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\sigma^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2(1/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \]
When \(\sigma^2\) is known:
\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\sigma^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2(1/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \]
When \(\sigma^2\) is unknown: \[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2(1/s_{\mathbf{x}}^2)/n}} \sim \mathcal{T}_{n-2} \]
Let \(Z\) be a standard normal rv : \(Z \sim \mathcal{N}(0, 1)\),
and \(X\) a chi squared rv with \(p\) degrees of freedom: \(X \sim \chi^2_p\), \(Z\) and \(X\) independent. Then \[
T = \frac{Z}{\sqrt{X/p}} \sim \mathcal{T}_p
\]
is a Student r.v. with \(p\) degrees of freedom.
We have (for \(p > 2\)): \[ \mathbb{E}[T] = 0 \quad \mathbb{Var}[T] = \frac{p}{p-2}. \]
It converges in distribution towards the standard Gaussian.
\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2_0}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2_1}} \sim \mathcal{T}_{n-2}\\ \frac{n-2}{\sigma^2}\hat{\sigma}^2 \sim \chi^2_{n-2} \]
\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
R
output\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
linreg <- lm(y_test ~ x_test) summary(linreg)
## ## Call: ## lm(formula = y_test ~ x_test) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.58688 -0.66510 0.05019 0.69597 2.86079 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.9448 0.1017 -19.12 <2e-16 *** ## x_test 3.0979 0.0866 35.77 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9812 on 98 degrees of freedom ## Multiple R-squared: 0.9289, Adjusted R-squared: 0.9281 ## F-statistic: 1280 on 1 and 98 DF, p-value: < 2.2e-16
Recall that the prediction error \(\hat{\epsilon}_{n+1} = y_{n+1} - \hat{y}_{n+1}\) is such that: \[ \begin{aligned} \mathbb{E}[\hat{\epsilon}_{n+1}] &= 0\\ \mathbb{Var}[\hat{\epsilon}_{n+1}] &= \sigma^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)\\ \end{aligned} \]
We get (variance known):
\[ y_{n+1} - \hat{y}_{n+1} \sim \mathcal{N}\left(0, \sigma^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)\right) \]
and (variance unknown): \[ \frac{y_{n+1} - \hat{y}_{n+1}}{\sqrt{\hat{\sigma}^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)}} \sim \mathcal{T}_{n-2} \]
library(here) ad <- read.csv(here("data", "Advertising.csv"), row.names = "X") head(ad)
## TV radio newspaper sales ## 1 230.1 37.8 69.2 22.1 ## 2 44.5 39.3 45.1 10.4 ## 3 17.2 45.9 69.3 9.3 ## 4 151.5 41.3 58.5 18.5 ## 5 180.8 10.8 58.4 12.9 ## 6 8.7 48.9 75.0 7.2
TV
, radio
, newspaper
: advertising budgets (thousands of $)sales
: number of sales (thousands of units)fit_tv <- lm(sales ~ TV, data = ad) plot(ad$TV, ad$sales) abline(fit_tv, col = "blue")
summary(fit_tv)
## ## Call: ## lm(formula = sales ~ TV, data = ad) ## ## Residuals: ## Min 1Q Median 3Q Max ## -8.3860 -1.9545 -0.1913 2.0671 7.2124 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.032594 0.457843 15.36 <2e-16 *** ## TV 0.047537 0.002691 17.67 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.259 on 198 degrees of freedom ## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099 ## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
## lm fit fit_tv <- lm(sales ~ TV, data = ad) plot(ad$TV, ad$sales) abline(fit_tv, col = "black") ## colors contrast_col <- hcl.colors(2) ## Confidence interval newx <- seq(0, 300, by = 1) conf_int <- predict(fit_tv, data.frame(TV = newx), interval = "conf") # Plot lines(newx, conf_int[, 2], col = contrast_col[1], lty = 2, lwd = 2) lines(newx, conf_int[, 3], col = contrast_col[1], lty = 2, lwd = 2) # Prediction intervals pred_int <- predict(fit_tv, data.frame(TV = newx), interval = "pred") # Plot lines(newx, pred_int[, 2], col = contrast_col[2], lty = 2, lwd = 2) lines(newx, pred_int[, 3], col = contrast_col[2], lty = 2, lwd = 2)
When \(\sigma^2\) is known:
\[ \hat{\boldsymbol{\beta}} = \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \boldsymbol{\beta} = \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \]
When \(\sigma^2\) is unknown: \[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \sim \mathcal{F}^2_{n-2} \]
Let \(U_1 \sim \chi^2_{p_1}\) and \(U_2 \sim \chi^2_{p_2}\), \(U_1\) and \(U_2\) independent. Then \[ F = \frac{U_1/p_1}{U_2/p_2} \sim \mathcal{F}^{p_1}_{p_2} \]
is a Fisher r.v. with \((p_1, p_2)\) degrees of freedom.
We have (for \(p_2 > 2\)): \[ \mathbb{E}[F] = \frac{p_2}{p_2 - 2}. \]
When \(p_2\) goes to infinity, \(\mathcal{F}^{p_1}_{p_2}\) converges in distribution to \(\frac{1}{p_1}\chi^2_{p_1}\).
\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2_0}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2_1}} \sim \mathcal{T}_{n-2} \]
\[ \begin{aligned} \beta_0 &\in \left[\hat{\beta}_0 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_0^2}\right]\\ \beta_1 &\in \left[\hat{\beta}_1 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_1^2}\right]\\ \end{aligned} \]
\[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \sim \mathcal{F}^2_{n-2} \]
\[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \leq f_{n-2}^2(1 - \alpha) \]
Since \(\mathbf{V}_n^{-1}\) is positive definite, this is the equation of an elliptic region.
\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]
Difference between: