Régression Multiple - Moindres Carrés

Paul Bastide - Ibrahim Bouzalmat

31/01/2024

Advertising Data

Advertising - Data

library(here)
ad <- read.csv(here("data", "Advertising.csv"), row.names = "X")
head(ad)

##      TV radio newspaper sales
## 1 230.1  37.8      69.2  22.1
## 2  44.5  39.3      45.1  10.4
## 3  17.2  45.9      69.3   9.3
## 4 151.5  41.3      58.5  18.5
## 5 180.8  10.8      58.4  12.9
## 6   8.7  48.9      75.0   7.2

TV, radio, newspaper: advertising budgets (thousands of $)
sales: number of sales (thousands of units)

Advertising - Data

attach(ad)
par(mfrow = c(1, 3))
plot(TV, sales); plot(radio, sales); plot(newspaper, sales)

Advertising - Questions

Previous chapters: use only one medium to predict sales
sales TV
or
sales radio
or
sales newspaper

Can we use all variables at once ?
sales TV radio newspaper

Makes it possible to answer new questions:
- Is the global fit better ?
- Which media contribute to sales ?
- Is there synergy among the media ?

From Simple to Multiple Regression

Model - Simple Regression

: quantitative response for (sales)
: quantitative predicting variable for (TV)
ϵi: “error” for i
- random variable
- (H1) for all (centered)
- (H2) for all (identical variance)
- (H3) for all (independent)

Model - Multiple Regression

: quantitative response for (sales)
: quantitative predicting variable for observation
(TV, radio and newspaper, i.e. )
ϵi: “error” for i
- random variable
- (H1) for all (centered)
- (H2) for all (identical variance)
- (H3) for all (independent)

Simple Regression - Vectorial notation

i.e.

random vector of responses
vector of ones
non random vector of predictors
random vector of errors
, non random, unknown coefficients

Multiple Regression - Matricial notation

i.e

random vector of responses
non random matrix of predictors
random vector of errors
non random, unknown vector of coefficients

Multiple Regression - Intercept

With the intercept, the model is written as:

Re-numbering, we get:

So that we see the intercept

as a predictor.

Multiple Regression - Intercept

Without loss of generality, we write:

We use this convention in the rest of the text.

Model

Multiple Regression - Model

Model:

random vector of responses
non random matrix of predictors
random vector of errors
non random, unknown vector of coefficients

Assumptions:

(H1)
(H2) and

Notations

The column and row vectors of are written as:

For :

And:

Degrees of freedom

For the model

there are degrees of freedom.

Degrees of freedom - simple regression

In a simple regression, we have:

There are degrees of freedom.
But only one “explanatory variables” .

Degrees of freedom - with intercept

If we write:

There are degrees of freedom.
But only “explanatory variables” .

Degrees of freedom - general case

If we write:

There are degrees of freedom.
If is the intercept,
then there are “explanatory variables” .

Reminders: Orthogonal Projection

Reminder - Euclidean Geometry 1/2

Let and be two vectors of .

The squared euclidean norm of is given by:

The euclidean scalar product between and is given by:

Reminder - Euclidean Geometry 2/2

We have the following formulas:

Reminder: Orthogonal Projection - 1/3

Let ,
linearly independent vectors,
and .

The orthogonal projection of on is defined as the unique vector such that:

It is also the unique vector such that is orthogonal to , i.e. orthogonal to for all .

There is a unique matrix such that .

Reminder: Orthogonal Projection - 2/4

Reminder: Orthogonal Projection - 3/4

Definition:

An square matrix is a projection matrix if .

If and is symmetric,
then is an orthogonal projection matrix.

For any , is the projection on , parallel to .

If is a projection matrix, then is an orthogonal projection matrix iff for any , we have: with and orthogonal.

Reminder: Matrix of Projection - 4/4

Definition:
An matrix is said to be orthogonal if .
The columns of are then an orthogonal basis of .

Property:
If is an orthogonal projection matrix, then there exists one orthogonal matrix and a diagonal matrix such that:

with:

Ordinary Least Squares (OLS)

OLS - Definition

The OLS estimator is given by:

Goal: Minimize the squared errors between:

the prediction of the model and
the actual observed value .

OLS - Projection

hence

is the orthogonal projection of on

the plan spanned by the columns of

Geometrical Interpretation

OLS - Projection

is the orthogonal projection matrix on .

with and orthogonal.

OLS - Expression

The OLS estimator is given by:

And the matrix of orthogonal projection on is:

OLS - Proof (Geometrical) - 1/2

is the orthogonal projection of on .

It is the only vector such that is orthogonal to .

I.e. is orthogonal to all :

OLS - Proof (Geometrical) - 2/2

As , is invertible, and:

Then:

As this equality is true for any , we get:

OLS - Proof (Analytical)

The minimum of quadratic function is obtained at the point where the gradient is zero:

Same conclusion as before.

Reminder - Expectation and Variance of random vectors

Expectation and Variance

Let be a random vector of dimension .

The expectation of is an vector:
The variance of is an matrix:

Expectation and Variance - Properties

Let be a deterministic matrix, and be a vector. Then:

Attention: Transpose is on the right (variance is a matrix).

The OLS Estimator is unbiased

OLS estimator is unbiased

The OLS estimator

is unbiased:

OLS estimator is unbiased - Proof - 1/3

OLS estimator is unbiased - Proof - 2/3

And:

OLS estimator is unbiased - Proof - 3/3

And:

Hence:

Variance of the OLS Estimator

The OLS estimator

has variance:

Variance of OLS Estimator - Proof - 1/3

Variance of OLS Estimator - Proof - 2/3

And:

Variance of OLS Estimator - Proof - 3/3

And:

Hence:

Gauss - Markov Theorem

Partial order on Symmetric matrices

Definition Let and two real symmetric matrices.
We say that is smaller than , and write

iif

is positive, semi-definite, i.e.:

or, equivalently:

Gauss-Markov theorem

The OLS estimator is the BLUE
(Best Linear Unbiased Estimator):
it is the linear unbiased estimator with the minimal variance.

Remark: The OLS estimator

is indeed linear, and it is unbiased.

Reminder - Variance of the Sum

Let and two random vectors of size . Then:

where:

Gauss-Markov - Proof - 1/6

Let a linear unbiased estimator of .
Let’s show that .

Gauss-Markov - Proof - 2/6

Let a linear unbiased estimator of .
Let’s show that .

i.e.

We need to prove: is positive, semi-definite.

Gauss-Markov - Proof - 3/6

We need to prove: is positive, semi-definite.

As is positive, semi-definite, we just need to prove:

Gauss-Markov - Proof - 3/6

is a linear unbiased estimator of .
Hence:

and:

for all

, hence:

Gauss-Markov - Proof - 4/6

Let’s show that:

Gauss-Markov - Proof - 5/6

and:

hence:

as :

Gauss-Markov - Proof - 6/6

Finally:

and:

is positive, semi-definite, i.e.:

Which ends the proof.

Residuals

Geometrical Interpretation

Residuals - Definitions

Residuals - Bias and Variance

Residuals - Bias and Variance - Proof

Bias:

Variance:

- Bias and Variance

- Bias and Variance - Proof

Bias:

Variance:

Covariance of and

Covariance of and - Proof

Variance Estimation

Unbiased Variance Estimator

is an unbiased estimator of .

Note: p parameters n - p degrees of freedom.

Unbiased Var Estimator - Proof - 1/2

Magic trick:

Unbiased Var Estimator - Proof - 2/2

We get:

as is the projection matrix on a space of dimension .

Prediction

Predict a new point

We fitted on

A new line of predictors comes along. How can we guess ?

We use the same model: with , and .

We predict with:

Question: What is the error ?

Prediction Error

The prediction error is such that:

Remarks:

is a line-vector of dimension .
is a matrix of dimension .
is a matrix of dimension .
is a scalar.

Prediction Error - Proof 1/2

Prediction Error - Proof 2/2

Because does not depend on :

Hence:

Geometrical Interpretation

Variance Decomposition

Assuming is in the model:

Using Pythagore:

Variance Decomposition

TSS: Total Sum of Squares
Amount of variability in the response before the regression is performed.
RSS: Residual Sum of Squares
Amount of variability that is left after performing the regression.
ESS: Explained Sum of Squares
Amount of variability that is explained by the regression.

Statistic

is the proportion of variability in that can be explained by the regression.

= 1: the regression is perfect.
and is in .

= 0: the regression is useless.
, the empirical mean is sufficient.

Statistic

Statistic - Issue - 1/3

set.seed(12890926)

## Predictors
n <- 100
x_1 <- runif(n, min = -2, max = 2)
x_2 <- runif(n, min = 0, max = 4)

## Noise
eps <- rnorm(n, mean = 0, sd = 5)

## Model sim
beta_0 <- -2; beta_1 <- 3; beta_2 <- -1
y_sim <- beta_0 + beta_1 * x_1 + beta_2 * x_2 + eps

Statistic - Issue - 2/3

fit <- lm(y_sim ~ x_1 + x_2)
summary(fit)$r.squared

## [1] 0.3337178

## unrelated noise variable x_3
x_3 <- runif(n, min = -4, max = 0)
## Fit
fit2 <- lm(y_sim ~ x_1 + x_2 + x_3)
summary(fit2)$r.squared

## [1] 0.3404782

The more there are predictors, the better is !

Statistic - Issue - 3/3

If has one more row, then:

Adjusted Statistic

Penalize for the number of predictors .
Attention includes the intercept (rank of matrix ).

Adjusted Statistic

fit <- lm(y_sim ~ x_1 + x_2)
summary(fit)$adj.r.squared

## [1] 0.31998

## unrelated noise variable x_3 and x_4
x_3 <- runif(n, min = -4, max = 0)
## Fit
fit2 <- lm(y_sim ~ x_1 + x_2 + x_3)
summary(fit2)$adj.r.squared

## [1] 0.3133534

Advertising

fit_all <- lm(sales ~ TV + radio + newspaper, data = ad)
summary(fit_all)

## 
## Call:
## lm(formula = sales ~ TV + radio + newspaper, data = ad)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8277 -0.8908  0.2418  1.1893  2.8292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.938889   0.311908   9.422   <2e-16 ***
## TV           0.045765   0.001395  32.809   <2e-16 ***
## radio        0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.686 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

To be continued

Questions

Confidence interval for ?
Can we test (i.e. no linear trend) ?

Assumptions on the moments are not enough.
We need assumptions on the specific distribution of the .

Most common assumption: are Gaussian.

Régression Multiple - Moindres Carrés

Advertising Data

Advertising - Data

Advertising - Data

Advertising - Questions

From Simple to Multiple Regression

Model - Simple Regression

Model - Multiple Regression

Simple Regression - Vectorial notation

Multiple Regression - Matricial notation

Multiple Regression - Matricial notation

Multiple Regression - Intercept

Multiple Regression - Intercept

Model

Multiple Regression - Model

Model:

Assumptions:

Notations

Degrees of freedom

Degrees of freedom - simple regression

Degrees of freedom - with intercept

Degrees of freedom - general case

Reminders: Orthogonal Projection

Reminder - Euclidean Geometry 1/2

Reminder - Euclidean Geometry 2/2

Reminder: Orthogonal Projection - 1/3

Reminder: Orthogonal Projection - 2/4

Reminder: Orthogonal Projection - 3/4

Reminder: Matrix of Projection - 4/4

Ordinary Least Squares (OLS)

OLS - Definition

OLS - Projection

OLS - Projection

Geometrical Interpretation

OLS - Projection

OLS - Expression

OLS - Expression

OLS - Proof (Geometrical) - 1/2

OLS - Proof (Geometrical) - 2/2

OLS - Proof (Analytical)

Reminder - Expectation and Variance of random vectors

Expectation and Variance

Expectation and Variance - Properties

The OLS Estimator is unbiased

OLS estimator is unbiased

OLS estimator is unbiased - Proof - 1/3

OLS estimator is unbiased - Proof - 2/3

OLS estimator is unbiased - Proof - 3/3

Variance of the OLS Estimator

Variance of the OLS Estimator

Variance of OLS Estimator - Proof - 1/3

Variance of OLS Estimator - Proof - 2/3

Variance of OLS Estimator - Proof - 3/3

Gauss - Markov Theorem

Partial order on Symmetric matrices

Gauss-Markov theorem

Reminder - Variance of the Sum

Gauss-Markov - Proof - 1/6

Gauss-Markov - Proof - 2/6

Gauss-Markov - Proof - 3/6

Gauss-Markov - Proof - 3/6

Gauss-Markov - Proof - 4/6

Gauss-Markov - Proof - 5/6

Gauss-Markov - Proof - 6/6

Residuals

Geometrical Interpretation

Residuals - Definitions

Residuals - Bias and Variance

Residuals - Bias and Variance - Proof

ˆy - Bias and Variance

ˆy - Bias and Variance - Proof

Covariance of ˆϵ and ˆy

Covariance of ˆϵ and ˆy - Proof

Variance Estimation

Unbiased Variance Estimator

Unbiased Var Estimator - Proof - 1/2

Unbiased Var Estimator - Proof - 2/2

Prediction

Predict a new point

Prediction Error

- Bias and Variance

- Bias and Variance - Proof

Covariance of and

Covariance of and - Proof

Statistic

Statistic

Statistic

Statistic - Issue - 1/3

Statistic - Issue - 2/3

Statistic - Issue - 3/3

Adjusted Statistic

Adjusted Statistic