Régression simple

24/01/2024

What we know

Model

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \forall 1 \leq i \leq n\]

$y_i$: quantitative response for $i$ (sales)
$x_i$: quantitative predicting variable for $i$ (TV)
$\epsilon_i$: “error” for $i$
- random variable
- (H1) $\mathbb{E}[\epsilon_i] = 0$ for all $i$ (centered)
- (H2) $\mathbb{Var}[\epsilon_i] = \sigma^2$ for all $i$ (identical variance)
- (H3) $\mathbb{Cov}[\epsilon_i; \epsilon_j] = 0$ for all $i \neq j$ (independent)

Note: Assumptions on first and second moments only.

OLS Estimators

\[ (\hat{\beta}_0, \hat{\beta}_1) = \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmin}} \left\{ \sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \right\}\]

Properties on the first and second moments only.

\[ \begin{aligned} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1\bar{x} & \hat{\beta}_1 &= \frac{s_{\mathbf{y},\mathbf{x}}^2}{s_{\mathbf{x}}^2} \\ \mathbb{E}[\hat{\beta}_0] &= \beta_0 & \mathbb{E}[\hat{\beta}_1] &= \beta_1 \\ \mathbb{Var}[\hat{\beta}_0] &= \frac{\sigma^2}{n} \left( 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} \right) & \mathbb{Var}[\hat{\beta}_1] &= \frac{\sigma^2}{n}\frac{1}{s_{\mathbf{x}}^2} \\ \mathbb{Cov}[\hat{\beta}_0; \hat{\beta}_1] &= - \frac{\sigma^2}{n} \frac{\bar{x}}{s_{\mathbf{x}}^2} \end{aligned} \]

Questions

Confidence interval for $(\hat{\beta}_0, \hat{\beta}_1)$ ?
Can we test $\hat{\beta}_1 = 0$ (i.e. no linear trend) ?

Assumptions on the moments are not enough.
$\hookrightarrow$ We need assumptions on the specific distribution of the $\epsilon_i$.

Most common assumption: $\epsilon_i$ are Gaussian.

Gaussian Model

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \quad \forall 1 \leq i \leq n\]

$y_i$: quantitative response for $i$ (sales)
$x_i$: quantitative predicting variable for $i$ (TV)
$\epsilon_i$: “error” for $i$
- random variable (r.v.)
- independent, identically distributed (iid), Gaussian
- Centered, with variance $\sigma^2$:

\[ \epsilon_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(0, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]

[Note (H1), (H2) and (H3) are still verified.]

Reminder: Gaussian Distribution 1/2

Let $X$ be a Gaussian r.v. with expectation $\mu$ and variance $\sigma^2$: \[ X \sim \mathcal{N}(\mu, \sigma^2). \]

It admits a probability density function (pdf): \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right) \quad \forall x \in \mathbb{R}, \]

And a characteristic function: \[ \phi_{X}(t) = \mathbb{E}[e^{itX}] = e^{i\mu t - \frac{\sigma^2 t^2}{2}} \quad \forall t \in \mathbb{R}. \]

Reminder: Gaussian Distribution 2/2

Distribution of $\mathbf{y}$

Model: \[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \quad \text{with} \quad \epsilon_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(0, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]

Distribution of $y_i$ ?

Reminder:
Linear function of a Gaussian

If $X \sim \mathcal{N}(\mu, \sigma^2)$, then for any $(a, b) \in \mathbb{R}^2$,
\[ Y = aX + b \sim \mathcal{N}(a\mu + b, a^2\sigma^2). \]

Proof: Exercise.
[Hint: use the characteristic function.]

$y_i = \beta_0 + \beta_1 x_i + \epsilon_i$
Apply the property with $a = 1$ and $b = \beta_0 + \beta_1 x_i$.
[Reminder: $x_i$ are assumed to be non random here.]

Distribution of $\mathbf{y}$

Model: \[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \quad \text{with} \quad \epsilon_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(0, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]
Distribution of $y_i$: \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]

Question: What is the likelihood of $\mathbf{y} = (y_1, \dotsc, y_n)^T$ ?

Maximum Likelihood Estimators

Reminder: Likelihood

If $X$ is a rv that admits a density $f_{\theta}$ that depends on a parameter $\theta$, then the likelihood function is the density, seen as a function of $\theta$ given an outcome $X = x$: \[ \theta \mapsto L(\theta | x) = f_{\theta}(x). \]

Example:
Assume that we observe one realization $x$ of a Gaussian random variable $X$, with unknown expectation $\mu$, but known variance $\sigma^2 = 1$.
Then the Likelihood of the observation $x_{obs}$ is the function: \[ \mu \to \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right). \]

Reminder: Maximum Likelihood 1/2

The Maximum Likelihood (ML) estimator $\hat{\theta}$ of the parameter $\theta$ is the one that maximizes the likelihood function in $\theta$, given an observation $x$:

\[ \hat{\theta} = \underset{\theta}{\operatorname{argmax}} L(\theta | x). \] Example:
Let $x_{obs}$ be one realization of a Gaussian r.v. $X$, with unknown expectation $\mu$, but known variance $\sigma^2 = 1$.
Then the ML estimator $\hat{\mu}$ of $\mu$ is: \[ \hat{\mu} = \underset{\mu \in \mathbb{R}}{\operatorname{argmax}} \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x_{obs} - \mu)^2}{2\sigma^2}\right). \]

Reminder: Maximum Likelihood 2/2

Likelihood of $\mathbf{y}$

Distribution of $y_i$: \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]

Likelihood of $\mathbf{y} = (y_1, \dotsc, y_n)^T$:

\[ \begin{aligned} &L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\ \end{aligned} \]

and \[ \begin{aligned} L(\beta_0, \beta_1, \sigma^2 | y_i) = \cdots \end{aligned} \]

Likelihood of $\mathbf{y}$

Distribution of $y_i$: \[ y_i \underset{\text{iid}}{\operatorname{\sim}} \mathcal{N}(\beta_0 + \beta_1 x_i, \sigma^2) \quad \text{for} \quad 1 \leq i \leq n \]

Likelihood of $\mathbf{y} = (y_1, \dotsc, y_n)^T$:

\[ \begin{aligned} &L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i) & [ind.]\\ \end{aligned} \]

and \[ \begin{aligned} L(\beta_0, \beta_1, \sigma^2 | y_i) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2\sigma^2}\right) \end{aligned} \]

Likelihood of $\mathbf{y}$

Likelihood of $\mathbf{y} = (y_1, \dotsc, y_n)^T$: \[ \begin{aligned} &L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \prod_{i = 1}^n L(\beta_0, \beta_1, \sigma^2 | y_i)\\ &\qquad = \prod_{i = 1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2\sigma^2}\right)\\ &\qquad = \frac{1}{\left(\sqrt{2\pi\sigma^2}\right)^n}\exp\left(-\frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\right)\\ &\log L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \cdots \end{aligned} \]

Log-Likelihood of $\mathbf{y}$

\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | y_1, \dotsc, y_n) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\\ \end{aligned} \]

Question: What are the Maximum Likelihood estimators of $\beta_0$, $\beta_1$ and $\sigma^2$ ?

We need to find: \[ (\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}, \hat{\sigma}^2_{ML}) = \underset{(\beta_0, \beta_1, \sigma^2) \in \mathbb{R}^2\times\mathbb{R}_+^*}{\operatorname{argmax}} \log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) \]

ML Estimators - $\beta_0$ and $\beta_1$

\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2\\ \end{aligned} \]

ML Estimators - $\beta_0$ and $\beta_1$

\[ \begin{aligned} &\log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y}) = \\ &\qquad - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\underbrace{\sum_{i = 1}^n (y_i - \beta_0 - \beta_1 x_i)^2}_{\text{Sum of Squares } SS(\beta_0, \beta_1)}\\ \end{aligned} \]

For any $\sigma^2 > 0$, \[ \begin{aligned} (\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}) &= \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmax}} \log L(\beta_0, \beta_1, \sigma^2 | \mathbf{y})\\ &= \underset{(\beta_0, \beta_1) \in \mathbb{R}^2}{\operatorname{argmin}} SS(\beta_0, \beta_1)\\ &= (\hat{\beta}_0^{OLS}, \hat{\beta}_1^{OLS}) \end{aligned} \]

ML Estimators - $\beta_0$ and $\beta_1$

\[ (\hat{\beta}_0^{ML}, \hat{\beta}_1^{ML}) = (\hat{\beta}_0^{OLS}, \hat{\beta}_1^{OLS}) \]

$\hookrightarrow$ The ML estimators are the same as the OLS estimators.

ML Estimators - $\sigma^2$

\[ \begin{aligned} \hat{\sigma}^2_{ML} &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}} \log L(\hat{\beta}_0, \hat{\beta}_1, \sigma^2 | \mathbf{y})\\ &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmax}} - \frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i = 1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\\ &= \underset{\sigma^2 \in \mathbb{R}_+^*}{\operatorname{argmin}} \frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\\ \end{aligned} \]

ML Estimators - $\sigma^2$

\[ \frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right] = \cdots = 0 \]

ML Estimators - $\sigma^2$

\[ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[\frac{n}{2} \log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i = 1}^n \hat{\epsilon}_i^2\right] &= \frac{n}{2} \frac{1}{\sigma^2} - \frac{1}{2\sigma^4}\sum_{i = 1}^n \hat{\epsilon}_i^2 = 0 \end{aligned} \] And we get:

\[ \hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i = 1}^n \hat{\epsilon}_i^2 \]

ML Estimators - $\sigma^2$ - Remarks

The ML estimator is different from the unbiased estimator of the variance we saw earlier.

\[ \hat{\sigma}^2_{ML} = \frac{1}{n}\sum_{i = 1}^n \hat{\epsilon}_i^2 \neq \frac{1}{n-2}\sum_{i = 1}^n \hat{\epsilon}_i^2 = \hat{\sigma}^2 \]

The ML estimator of the variance is biased: \[ \mathbb{E}[\hat{\sigma}^2_{ML}] = \frac{n - 2}{n}\sigma^2 \neq \sigma^2 \]

Distribution of the Coefficients - $\sigma^2$ known

Moments of the estimators

We already know the moments of the estimators:

\[ \begin{aligned} \mathbb{E}\left[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \right] &= \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix} \\ \mathbb{Var}\left[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \right] &= \frac{\sigma^2}{n} \begin{pmatrix} 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\ -\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2} \end{pmatrix} = \sigma^2 \mathbf{V}_n \end{aligned} \]

What can we say about their distribution ?

Distribution of $(\hat{\beta}_0, \hat{\beta}_1)$

When the variance $\sigma^2$ is known, we find that the vector of estimators $(\hat{\beta}_0, \hat{\beta}_1)$ is a Gaussian vector, with expectation $(\beta_0, \beta_1)$ and variance $\sigma^2 \mathbf{V}_n$:

\[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \]

Proof delayed to next chapter (multivariate regression).

Simulated dataset

Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

Simulated dataset : replicates

Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

Simulated dataset : replicates

Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

Simulated dataset : replicates

Simulate according to the model : \[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

\[ \hat{\beta}_0 \sim \mathcal{N}\left(\beta_0, \frac{\sigma^2}{n}\left(1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2}\right)\right) \quad \hat{\beta}_1 \sim \mathcal{N}\left(\beta_1, \frac{\sigma^2}{n}\frac{1}{s_{\mathbf{x}}^2}\right) \]

Distribution of the Variance - $\sigma^2$ known

Unbiased estimator of the Variance

\[ \hat{\sigma}^2 = \frac{1}{n-2}\sum_{i = 1}^n \hat{\epsilon}_i^2 \qquad \mathbb{E}[\hat{\sigma}^2] = \sigma^2 \]

Question: Distribution of $\hat{\sigma}^2$ ?

Distribution of $\hat{\sigma}^2$

\[ \frac{n-2}{\sigma^2}\hat{\sigma}^2 \sim \chi^2_{n-2} \]

$~$

The variance estimator $\hat{\sigma}^2$ is distributed according to a chi squared distribution with n-2 degrees of freedom.

$~$

$\hat{\beta}$ and $\hat{\sigma}^2$ are independent.

$~$

Proof delayed to next chapter (multivariate regression).

Reminder: Chi Squared Distribution 1/3

Let $X_1, \dotsc, X_p$ be $p$ standard normal iid rv : $X_i \sim \mathcal{N}(0, 1)$. Then \[ X = \sum_{i = 1}^p X_i^2 \sim \chi^2_p \]

is a Chi squared r.v. with $p$ degrees of freedom.

We have: \[ \mathbb{E}[X] = p \quad \mathbb{Var}[X] = 2p. \]

It converges in distribution towards the Gaussian with matching moments.

Reminder: Chi Squared Distribution 2/3

Reminder: Chi Squared Distribution 3/3

Distribution of the Coefficients - $\sigma^2$ unknown

Distribution of $(\hat{\beta}_0, \hat{\beta}_1)$

When $\sigma^2$ is known:

\[ \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \qquad \mathbf{V}_n= \frac{1}{n} \begin{pmatrix} 1 + \frac{\bar{x}^2}{s_{\mathbf{x}}^2} & -\frac{\bar{x}}{s_{\mathbf{x}}^2}\\ -\frac{\bar{x}}{s_{\mathbf{x}}^2} & \frac{1}{s_{\mathbf{x}}^2} \end{pmatrix} \]

i.e.

\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\sigma^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2(1/s_{\mathbf{x}}^2)/n}} \sim \mathcal{N}(0, 1) \]

Problem $\sigma^2$ is generally unknown.

Solution Replace $\sigma^2$ by $\hat{\sigma}^2$.

How is the distribution changed ?

Distribution of $(\hat{\beta}_0, \hat{\beta}_1)$

When $\sigma^2$ is known:

When $\sigma^2$ is unknown: \[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2(1 + \bar{x}^2/s_{\mathbf{x}}^2)/n}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2(1/s_{\mathbf{x}}^2)/n}} \sim \mathcal{T}_{n-2} \]

Replace standard Gaussian by Student distribution.

Reminder: Student Distribution 1/2

Let $Z$ be a standard normal rv : $Z \sim \mathcal{N}(0, 1)$,
and $X$ a chi squared rv with $p$ degrees of freedom: $X \sim \chi^2_p$, $Z$ and $X$ independent. Then \[ T = \frac{Z}{\sqrt{X/p}} \sim \mathcal{T}_p \]

is a Student r.v. with $p$ degrees of freedom.

We have (for $p > 2$): \[ \mathbb{E}[T] = 0 \quad \mathbb{Var}[T] = \frac{p}{p-2}. \]

It converges in distribution towards the standard Gaussian.

Reminder: Student Distribution 2/2

Confidence Intervals

Confidence intervals

\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2_0}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2_1}} \sim \mathcal{T}_{n-2}\\ \frac{n-2}{\sigma^2}\hat{\sigma}^2 \sim \chi^2_{n-2} \]

With probability $1-\alpha$: \[ \begin{aligned} \beta_0 &\in \left[\hat{\beta}_0 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_0^2}\right]\\ \beta_1 &\in \left[\hat{\beta}_1 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_1^2}\right]\\ \sigma^2 &\in \left[\frac{(n-2)\hat{\sigma}^2}{c_{n-2}(1 - \alpha/2)}; \frac{(n-2)\hat{\sigma}^2}{c_{n-2}(\alpha/2)}\right] \end{aligned} \]

Simulated dataset

\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

With probability $1-\alpha$: \[ \begin{aligned} t_{n-2}(\alpha/2) \leq \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2_1}} \leq t_{n-2}(1 - \alpha/2) \end{aligned} \]

Simulated dataset - `R` output

\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

linreg <- lm(y_test ~ x_test)
summary(linreg)

## 
## Call:
## lm(formula = y_test ~ x_test)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58688 -0.66510  0.05019  0.69597  2.86079 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.9448     0.1017  -19.12   <2e-16 ***
## x_test        3.0979     0.0866   35.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9812 on 98 degrees of freedom
## Multiple R-squared:  0.9289, Adjusted R-squared:  0.9281 
## F-statistic:  1280 on 1 and 98 DF,  p-value: < 2.2e-16

Prediction Intervals

Prediction error

Recall that the prediction error $\hat{\epsilon}_{n+1} = y_{n+1} - \hat{y}_{n+1}$ is such that: \[ \begin{aligned} \mathbb{E}[\hat{\epsilon}_{n+1}] &= 0\\ \mathbb{Var}[\hat{\epsilon}_{n+1}] &= \sigma^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)\\ \end{aligned} \]

Prediction Distribution

We get (variance known):

\[ y_{n+1} - \hat{y}_{n+1} \sim \mathcal{N}\left(0, \sigma^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)\right) \]

and (variance unknown): \[ \frac{y_{n+1} - \hat{y}_{n+1}}{\sqrt{\hat{\sigma}^2 \left(1 + \frac{1}{n} + \frac{1}{ns_{\mathbf{x}}^2} (x_{n+1} - \bar{x})^2\right)}} \sim \mathcal{T}_{n-2} \]

Advertising Dataset

Advertising

library(here)
ad <- read.csv(here("data", "Advertising.csv"), row.names = "X")
head(ad)

##      TV radio newspaper sales
## 1 230.1  37.8      69.2  22.1
## 2  44.5  39.3      45.1  10.4
## 3  17.2  45.9      69.3   9.3
## 4 151.5  41.3      58.5  18.5
## 5 180.8  10.8      58.4  12.9
## 6   8.7  48.9      75.0   7.2

TV, radio, newspaper: advertising budgets (thousands of $)
sales: number of sales (thousands of units)

Advertising - TV - 1/6

fit_tv <- lm(sales ~ TV, data = ad)
plot(ad$TV, ad$sales)
abline(fit_tv, col = "blue")

Advertising - TV - 2/6

summary(fit_tv)

## 
## Call:
## lm(formula = sales ~ TV, data = ad)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3860 -1.9545 -0.1913  2.0671  7.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.032594   0.457843   15.36   <2e-16 ***
## TV          0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Advertising - TV - 3/6

Is there a relationship between TV advertising budget and sale ?
- Student t test: p values < 2e-16
How strong is the relationship ?
- RSS = 3.259, RSS / $\bar{y}$ = 23% (rough percentage of error)
- $R^2$: 0.6119 : Only around 61% of the variance explained.
How large is the effect of TV on sales ?
- $\hat{\beta}_1 \approx 0.047537 \pm 1.96 \times 0.002691$
- CI away from 0
- When I spend 1000$ more in TV, I sell about 47.5 more units.

Advertising - TV - 4/6

## lm fit
fit_tv <- lm(sales ~ TV, data = ad)
plot(ad$TV, ad$sales)
abline(fit_tv, col = "black")
## colors
contrast_col <- hcl.colors(2)
## Confidence interval
newx <- seq(0, 300, by = 1)
conf_int <- predict(fit_tv, data.frame(TV = newx), interval = "conf")
# Plot
lines(newx, conf_int[, 2], col = contrast_col[1], lty = 2, lwd = 2)
lines(newx, conf_int[, 3], col = contrast_col[1], lty = 2, lwd = 2)
# Prediction intervals
pred_int <- predict(fit_tv, data.frame(TV = newx), interval = "pred")
# Plot
lines(newx, pred_int[, 2], col = contrast_col[2], lty = 2, lwd = 2)
lines(newx, pred_int[, 3], col = contrast_col[2], lty = 2, lwd = 2)

Advertising - TV - 5/6

95% confidence interval (dark purple) and prediction interval (light yellow)

Advertising - TV - 6/6

How accurately can we predict future sales ?
- See prediction intervals

Is the relationship linear ?
- Does it look like it ? More on that later.
Which media contribute to sales ?
- Start again with newspapers and radio.
Is there synergy among the media ?
- ??

To answer these questions better, we need multiple linear regression.
- Next chapter.

Joint Distribution of the Coefficients - $\sigma^2$ unknown

Distribution of $(\hat{\beta}_0, \hat{\beta}_1)$

When $\sigma^2$ is known:

\[ \hat{\boldsymbol{\beta}} = \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{pmatrix} \sim \mathcal{N}\left( \boldsymbol{\beta} = \begin{pmatrix} \beta_0\\ \beta_1 \end{pmatrix}; \sigma^2 \mathbf{V}_n \right) \]

When $\sigma^2$ is unknown: \[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \sim \mathcal{F}^2_{n-2} \]

with $\mathcal{F}^2_{n-2}$ the Fisher distribution.

Reminder: Fisher Distribution 1/2

Let $U_1 \sim \chi^2_{p_1}$ and $U_2 \sim \chi^2_{p_2}$, $U_1$ and $U_2$ independent. Then \[ F = \frac{U_1/p_1}{U_2/p_2} \sim \mathcal{F}^{p_1}_{p_2} \]

is a Fisher r.v. with $(p_1, p_2)$ degrees of freedom.

We have (for $p_2 > 2$): \[ \mathbb{E}[F] = \frac{p_2}{p_2 - 2}. \]

When $p_2$ goes to infinity, $\mathcal{F}^{p_1}_{p_2}$ converges in distribution to $\frac{1}{p_1}\chi^2_{p_1}$.

Reminder: Fisher Distribution 2/2

Confidence Region

Confidence Intervals

Marginal distributions:

\[ \frac{\hat{\beta}_0 - \beta_0}{\sqrt{\hat{\sigma}^2_0}} \sim \mathcal{T}_{n-2} \qquad \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\hat{\sigma}^2_1}} \sim \mathcal{T}_{n-2} \]

Give confidence intervals:

\[ \begin{aligned} \beta_0 &\in \left[\hat{\beta}_0 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_0^2}\right]\\ \beta_1 &\in \left[\hat{\beta}_1 \pm t_{n-2}(1 - \alpha/2) \sqrt{\hat{\sigma}_1^2}\right]\\ \end{aligned} \]

Confidence Region

Joint distribution:

\[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \sim \mathcal{F}^2_{n-2} \]

Gives confidence region:

\[ \frac{1}{2\hat{\sigma}^2}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^T \mathbf{V}_n^{-1}\left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \leq f_{n-2}^2(1 - \alpha) \]

Since $\mathbf{V}_n^{-1}$ is positive definite, this is the equation of an elliptic region.

Simulated dataset

\[\mathbf{y} = -2 \cdot \mathbb{1} + 3 \cdot \mathbf{x} + \boldsymbol{\epsilon}\]

Difference between:

confidence intervals (green) and the
confidence region (purple).

What we know

Model

OLS Estimators

Questions

Gaussian Model

Gaussian Model

Reminder: Gaussian Distribution 1/2

Reminder: Gaussian Distribution 2/2

Distribution of \(\mathbf{y}\)

Reminder: Linear function of a Gaussian

Distribution of \(\mathbf{y}\)

Maximum Likelihood Estimators

Reminder: Likelihood

Reminder: Maximum Likelihood 1/2

Reminder: Maximum Likelihood 2/2

Reminder: Maximum Likelihood 2/2

Reminder: Maximum Likelihood 2/2

Likelihood of \(\mathbf{y}\)

Likelihood of \(\mathbf{y}\)

Likelihood of \(\mathbf{y}\)

Log-Likelihood of \(\mathbf{y}\)

ML Estimators - \(\beta_0\) and \(\beta_1\)

ML Estimators - \(\beta_0\) and \(\beta_1\)

ML Estimators - \(\beta_0\) and \(\beta_1\)

ML Estimators - \(\sigma^2\)

ML Estimators - \(\sigma^2\)

ML Estimators - \(\sigma^2\)

ML Estimators - \(\sigma^2\) - Remarks

Distribution of the Coefficients - \(\sigma^2\) known

Moments of the estimators

Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)

Simulated dataset

Simulated dataset : replicates

Simulated dataset : replicates

Simulated dataset : replicates

Distribution of the Variance - \(\sigma^2\) known

Unbiased estimator of the Variance

Distribution of \(\hat{\sigma}^2\)

Reminder: Chi Squared Distribution 1/3

Reminder: Chi Squared Distribution 2/3

Reminder: Chi Squared Distribution 3/3

Distribution of the Coefficients - \(\sigma^2\) unknown

Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)

Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)

Reminder: Student Distribution 1/2

Reminder: Student Distribution 2/2

Confidence Intervals

Confidence intervals

Simulated dataset

Simulated dataset - R output

Prediction Intervals

Prediction error

Prediction Distribution

Advertising Dataset

Advertising

Advertising - TV - 1/6

Advertising - TV - 2/6

Advertising - TV - 3/6

Advertising - TV - 4/6

Advertising - TV - 5/6

Advertising - TV - 6/6

Joint Distribution of the Coefficients - \(\sigma^2\) unknown

Distribution of \((\hat{\beta}_0, \hat{\beta}_1)\)

Reminder: Fisher Distribution 1/2

Reminder: Fisher Distribution 2/2

Confidence Region

Confidence Intervals

Confidence Region

Simulated dataset

Reminder:
Linear function of a Gaussian

Simulated dataset - `R` output