Ewen Gallic ewen.gallic@gmail.com
MASTER in Economics - Track EBDS - 2nd Year 2020-2021
This part presents some concepts of statistical learning, through the prism of regression.
Ewen Gallic Machine learning and statistical learning 2/124
1. Some context
1. Some context
Model specification
In a regression problem, the aim is to understand how aresponse variableyvaries,conditionally on the available information on somepredictors x.
Let us take an example, that of thesalaries for Professors in the US in 2008-09.
The salary of a professor may be linked, among other things, to the number of years since he or she obtained their Ph.D.
Ewen Gallic Machine learning and statistical learning 4/124
50000 100000 150000 200000
0 20 40
Years since Ph.D
Salary in 2008−09 (nine−month salary, in dollars)
linear regression
loess
conditional mean
linear regression
loess
observation
1. Some context
Salary as a function of years since Ph.D
Here, the linear regression suggests that on the average, the salary increases with the number of years since Ph.D:
• the slope of 985.3 indicates that each additional year since Ph.D leads to an increase of 985 dollars of 9-month salary.
But the relationship does not seem to be linear. . .
Ewen Gallic Machine learning and statistical learning 6/124
It should be noted here that:
• the regression analysis does not depend on agenerative modelhere (a model explaining how the data are generated)
• there is nocausalclaims regarding the way mean salary would change if the number of years since Ph.D is altered
• there is no statistical inference
We could add some predictors to the model to get a better story on what is going on with salary :
• some ommitted variables may play an important role in explaining the variations.
1. Some context
Salary as a function of years since Ph.D
We can also perform some regression analysis if theresponse variable is categorical.
Let us look at the salary in a different way: let us split it into two categories, either<$100kor
≥$100k.
For each decile of years since Ph.D, we can plot theconditional proportions.
Ewen Gallic Machine learning and statistical learning 8/124
0.00 0.25 0.50 0.75 1.00
1 5 10 13 17.4 21 25 30 35 40 56
Years since Ph.D by deciles
Salary type >=100k
<100k
1. Some context
Levels of regression analysis
Berk (2008) mentionsthree levels of regression analysis:
• Level I regression analysis:
I aiming atdescribing the data I assumption free
I should not be neglected
• Level II regression analysis:
I based onstatistical inference
I uses results from level I regression analysis I use with real data may be challenging I allows to make predictions
• Level III regression analysis:
I based oncausal inference
I uses level I analysis, sometimes coupled with level II
I rely more on algorithmic methods rather than model-based methods.
Ewen Gallic Machine learning and statistical learning 10/124
2. The linear regression
2. The linear regression
Some references
• Berk (2008). Statistical learning from a regression perspective, volume 14. Springer.
• Cornillon and Matzner-Løber (2007). Régression: théorie et applications. Springer.
• James et al. (2013). An introduction to statistical learning, volume 112. Springer.
Ewen Gallic Machine learning and statistical learning 12/124
Linear regression combines level I and level II perspectives.
It is useful when one wants topredict a quantitative response.
A lot of newer statistical learning approaches can be seen as generalizations or extensions of linear regression, as reminded inJames et al. (2013).
2. The linear regression 2.1. Simple linear regression
2.1 Simple linear regression
Ewen Gallic Machine learning and statistical learning 14/124
Let us consider first the case ofsimple linear regression.
We aim at predicting a quantitative response variable y using a single predictor x (or regressor).
• y is a n×1numerical response variable, wherenrepresents the numbr of observations
• x is an×1 predictor.
We assume there exists a linear relationship betweenyandxsuch that:
yi=β0+β1xi+εi, i= 1, . . . , n, (1) whereεi is an error term normally distributed with 0 mean and varianceσ2,i.e,εi∼ N(0, σ2).
2. The linear regression 2.1. Simple linear regression
Principle
In Eq. 1, thecoefficients (or parameters) β0 (i.e., the constant) andβ1 (i.e., the slope) are unknown parameters to be estimated.
These coefficients areestimatedusing atraining sample.
The estimates ofβ0andβ1 are, respectively,βˆ0 andβˆ1.
Once they are estimated using a learning procedure (in this case using linear regression), they can used topredictvalues for y for some valuex0:
ˆ
y0= ˆβ0+ ˆβ1x0 (2)
Ewen Gallic Machine learning and statistical learning 16/124
2.1.1 Estimating the coefficients
2. The linear regression 2.1. Simple linear regression 2.1.1. Estimating the coefficients
Estimating the coefficients
To estimateβ0andβ1, we rely on a set of training examples{(x1, y1), . . . ,(xn, yn)}.
For example, let us go back to our data describing the 9 month salary of professors (the response variable) and look at the relationship between the salary and years since Ph.D (x).
Ewen Gallic Machine learning and statistical learning 18/124
Figure 1: Varying the intercept. Figure 2: Varying the slope.
There is an infinity of possibles values that one can pick forβˆ0 andβˆ1.
However, we want to find an estimation that leads to a line beingas close as possible to the points:
but what does “close” mean?
Ewen Gallic Machine learning and statistical learning 19/124
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 10 20 30
0 20 40
Years since Ph.D
Salary ($10000)
Intercept: 8 Slope: 0.1
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 10 20 30
0 20 40
Years since Ph.D
Salary ($10000)
Intercept: 9.17 Slope: −0.05
2. The linear regression 2.1. Simple linear regression 2.1.1. Estimating the coefficients
Estimating the coefficients
The most common metric we want to minimize is known as theleast square criterion.
The predictionsyˆi for each of thexi,i= 1, . . . , nare given byyˆi = ˆβ0+ ˆβ1xi.
Letei=yi−yˆitheithresidual,i.e., the difference between the osberves value and its prediction by the linear model.
Theresidual sum of squareis defined as:
RSS=
n
X
i=1
ε2i =
n
X
i=1
(yi−βˆ1xi−βˆ0)2. (3)
We aim atminimizing this metric.
Ewen Gallic Machine learning and statistical learning 20/124
It can easily be shown that the minimization of the RSS leads to:
βˆ1=
Pn
i=1xiyi−n¯x¯y
Pn
i=1x2i−n¯x2
βˆ0= ¯y−βˆ1x¯
(4)
wherex¯=n1Pn
i=1xi,y¯=n1Pn i=1yi.
2. The linear regression 2.1. Simple linear regression 2.1.1. Estimating the coefficients
Least squares coefficient estimates
Here, the least squares coefficient estimatesβˆ0 andβ1 are, respectively,9.1719and0.0985.
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 20 30
0 20 40
Years since Ph.D
Salary ($10000)
Intercept: 9.1719 Slope: 0.0985
Figure 3: Fit of the Least Square for the regression of years since Ph.D onto the 9 months salaryof Professors.
Ewen Gallic Machine learning and statistical learning 22/124
We can have a look at the RSS when we vary the values ofβ0 andβ1:
Figure 4: Surface plot of the RSS depending on the values ofβˆ andβˆ.
●
−0.2
−0.1 0.0 0.1 0.2 0.3
7 9 11
Intercept
Slope
10000 20000 30000 40000 50000 RSS value
Figure 5: Contour plot of the RSS depending on the values ofβˆ andβˆ.
Ewen Gallic Machine learning and statistical learning 23/124
Intercept Slope
RSS
●
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
2.1.2 Accuracy of the coefficient estimates
Ewen Gallic Machine learning and statistical learning 24/124
The estimatesβˆ0andβˆ1 are point estimates.
When they are estimated by least squares, they are:
• unbiased
I E( ˆβ0) =β0 andE( ˆβ1) =β1
• efficient
I V( ˆβ0)andV( ˆβ1)are minimal
• convergent
I limn→+∞V( ˆβ0) = 0andlimn→+∞V( ˆβ1) = 0
They are calledBLUE(Best Linear Unbiased Estimator).
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
Accuracy of the coefficient estimates
It is easy to show that:
V( ˆβ0) =σ2
1
n+Pn x
i=1(xi−x)2
V( ˆβ1) =Pn σ2 i=1(xi−x)2
(5)
whereσ2 can be estimated:
ˆ σ2=
Pn
i=1(yi−yˆi)2
n−2 =
Pn i=1e2i n−2 .
Ewen Gallic Machine learning and statistical learning 26/124
Figure 6: A: True relationship (in red), Observed values ofy(points) and Least Squares line (in blue). B: True relationship (in red), Current Least Squares line (in blue), Previous Least Squares lines (in gray).
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−20
−10 0 10 20 30
−5 0 5
x
y
A
−20
−10 0 10 20
−5 0 5
x
y
B
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
Accuracy of the coefficient estimates
sd beta_0
sd beta_1 mean
beta_0
mean beta_1
0 250 500 750 1000 0 250 500 750 1000
0 250 500 750 1000 0 250 500 750 1000
1.98 1.99 2.00 2.01
0.06 0.07 0.08 0.96
1.00 1.04
0.36 0.40 0.44
Number of replications
Value
Figure 7: Mean of estimates ofβ0andβ1depending on the number of resampling.
Ewen Gallic Machine learning and statistical learning 28/124
We wish to test if a coefficientθ,θ∈ {β0, β1} is equal to a specific valueθ0:
(H0:θ=θ0 H1:θ6=θ0
We know thatθˆ∼ N
θ,Pn σ2 i=1(xi−¯x)2
, so:
θˆ−θ σ/pPn
i=1(xi−x)¯ 2 ∼ N(0,1).
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
Hypothesis tests
As Pn
i=1ε2i
σ2 ∼χ2n−2, we can define a variable T as:
T =
θ−θˆ σ/pPn
i=1(xi−¯x)2
rPn i=1ε2i σ2u /√
n−2
∼ St(n−2)
We can show that the expression ofT can be simplified to:
T = θˆ−θ
ˆ σθˆ
Ewen Gallic Machine learning and statistical learning 30/124
It is thus possible to perform the following test:
(H0:θ=θ0
H1:θ6=θ0
knowing that θ−θˆσˆ
ˆθ ∼ St(n−2)
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
Hypothesis tests
And we need to find the following probability:
P −tα/2<
θˆ−θ ˆ σθˆ
< tα/2
!
We therefore need to compute a t-statistic, that measures the number of standard deviations thatθˆis away fromθ0:
tobs.= θˆ−θ0
ˆ σθˆ
• if tobs.∈
−tα/2, tα/2 :
I wedo not reject the null hypothesis(H0) with a first-order risk ofα%
• if tobs.∈/
−tα/2, tα/2 :
I wereject the null hypothesis(H0) with a first-order risk ofα%
Ewen Gallic Machine learning and statistical learning 32/124
Most of the time, we are interested in a specific case:
(H0:α= 0 H1:α6= 0,
In such a case, the t-statistic becomes:
T = θˆ−0
ˆ σθ
= θˆ ˆ σθˆ
The observed value istobs.= ˆσαˆ
ˆ α.
2. The linear regression 2.1. Simple linear regression 2.1.2. Accuracy of the coefficient estimates
Hypothesis tests: confidence interval
We can also use the standard error of the coefficient estimates to construct a confidence interval:
I.C.\θ(1−α) =h
θˆ±tα/2×σˆθˆ
i
. (6)
If the intervals contain 0, then we can conclude that the coefficientθis not statistically different from zero (at theα%level of significance).
We can also compute the probability of observing any number equal to | t | or larger while assumingθ= 0(this probability is known as the p-value).
Ewen Gallic Machine learning and statistical learning 34/124
Least squares (Intercept) 9.17∗∗∗
(0.28) yrs.since.phd 0.10∗∗∗
(0.01)
R2 0.18
Adj. R2 0.17
Num. obs. 397
∗∗∗p <0.001;∗∗p <0.01;∗p <0.05 Table 1: Statistical models
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
2.1.3 Accuracy of the model
Ewen Gallic Machine learning and statistical learning 36/124
Recall that the linear regression is a supervised learning method. Hence, we can compare the predictions we obtain with the observed values of the output variable.
We want to have an idea of the quality of the estimation, to know how well the model fits the data.
To that end, we usually use severalmetrics, among which:
• the root mean squared error (RMSE)
• the residual standard error (RSE)
• theR2statistic.
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: RMSE
Themean squared error(MSE) is an estimate of theaverage of the squares of the errors:
MSE= 1 n
n
X
i=1
(yi−yˆi)2 (7)
Theroot mean squared erroris the square root of the MSE:
RMSE= v u u t 1 n
n
X
i=1
(yi−yˆi)2= rRSS
n , (8)
where RSS=Pn
i=1(yi−yˆi)2
Ewen Gallic Machine learning and statistical learning 38/124
The value of the RMSE is always non-negative.
A value of0 indicates a perfect fit to the data.
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: RSE
Recall that the linear model contains an error term (ε). Hence, we will not be able to perfectly predict the response variable.
TheResidual Standard Error is the average amount that the response will deviate from the true regression line. It is an estimate of the standard deviation ofε:
RSE= r 1
n−2RSS= v u u t
1 n−2
n
X
i=1
(yi−yˆi)2. (9)
Ewen Gallic Machine learning and statistical learning 40/124
In our example of the regression of salaries onto years since Ph.D, the value of the RSE is2.7534.
This means that the actual salary can deviate from the true regression line by approximately 2.7534thousand dollars, on average.
The mean salary in the data is$11.37065thousand dollars. Hence, the percentage error for any prediction, using our estimation would be2.7534/11.37065≈25%.
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: R
2Now, let us turn to theR2 statistic, which provides another method to assess the quality of fit.
TheR2 measures the proportion of variance explained. It takes a value between 0 and 1.
Let us illustrate this.
Ewen Gallic Machine learning and statistical learning 42/124
The variations ofy are only partially explained by those ofx
x y
x1 y1
x2 y2
Figure 8: Variation fromy2toy1
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: R
2As shown in Figure 8, the variation fromy1 toy2is partially explained by the variation from x1
tox2.
Thequality of fit at each point, as measured by the total variation, can therefore be broken down into two parts:
• theexplained variation
• theresidual variation
using the average point(x, y)as reference,i.e.:
yi−y¯
| {z }
total variation
= yˆi−y¯
| {z }
explained variation
+ yi−yˆi
| {z }
residual variation
.
Ewen Gallic Machine learning and statistical learning 44/124
The closerAis toA, the stronger the explained variation is, relatively.
x y
¯ x
¯ y
xi yi
A ˆ
yi
Aˆ Explained variance
Residual variance
Figure 9: Decomposition of the variation.
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: R
2Thus, one way to assess the quality of the adjustment is to measure the following ratio:
explained variance total variance Or, for all observations:
R2= Pn
i=1(ˆyi−y)¯ 2 Pn
i=1(yi−y)¯ 2 =ESS
TSS = explained sum of squares
total sum of squares (10)
Ewen Gallic Machine learning and statistical learning 46/124
We can write theR2 differently, as we know that:
n
X
i=1
(yi−yˆi)2=
n
X
i=1
(yi−y)¯ 2−βˆ12
n
X
i=1
(xi−x)¯ 2=
n
X
i=1
(yi−y)¯2−
n
X
i=1
(ˆyi−y)¯ 2
Thus:
R2= Pn
i=1(yi−y)¯ 2−Pn
i=1(yi−yˆi)2 Pn
i=1(yi−y)¯ 2
= 1− Pn
i=1(yi−yˆi)2 Pn
i=1(yi−y)¯ 2 = 1−RSS
TSS (11)
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
Accuracy of the model: R
2The value of theR2 lies between 0 and 1:
R2= 1−RSS
TSS ⇒ 0≤R2≤1.
• When the economictheorysuggests that the relationship between the response and its predictor should be linear, we expect the value of the R2 to be really close to one, otherwise, it suggests there might be something wrong with the generation of the data.
• In other situations, when thelinear relationship can be at best a rough approximation of the real form, we expect to find low values of the R2.
Ewen Gallic Machine learning and statistical learning 48/124
It can be noted that in the case of simple linear regression, the R is equal to the squared correlation coefficient.
Indeed:
yi−yˆi=yi−y¯+ ¯y−yˆi
= (yi−y)¯ −(ˆyi−y)¯
= (yi−y)¯ −
βˆ1xi+ ˆβ0−βˆ1x¯−βˆ0
= (yi−y)¯ −βˆ1(xi−x).¯
Taking the squared value:
(yi−yˆi)2= (yi−y)¯ 2+ ˆβ1
2(xi−x)¯ 2−2 ˆβ1(yi−y)(x¯ i−x)¯
2. The linear regression 2.1. Simple linear regression 2.1.3. Accuracy of the model
R
2and correlation
Which leads to:
(yi−yˆi)2= (yi−y)¯ 2+ ˆβ1
2(xi−x)¯ 2−2 ˆβ1(yi−y)(x¯ i−x)¯
Summing on all individuals:
n
X
i=1
(yi−yˆi)2=
n
X
i=1
(yi−y)¯ 2+ ˆβ12
n
X
i=1
(xi−x)¯ 2−2 ˆβ1
n
X
i=1
(yi−y)(x¯ i−x)¯
=
n
X
i=1
(yi−y)¯ 2+ ˆβ1 2 n
X
i=1
(xi−x)¯ 2−2 ˆβ1 n
X
i=1
(xi−x)¯ 2
=
n
X
i=1
(yi−y)¯ 2−βˆ1 2 n
X
i=1
(xi−x)¯ 2
Ewen Gallic Machine learning and statistical learning 50/124
In can indeed be shown that 2 ˆβ1
2 n
X
i=1
(xi−x)¯ 2= 2 ˆβ1 n
X
i=1
(xi−x)¯ 2 Pn
i=1(xi−x)(y¯ i−x)¯ Pn
i=1(xi−x)¯ 2
= 2 ˆβ1 n
X
i=1
(yi−y)(x¯ i−x).¯
We also have:
(ˆyi−y) = ˆ¯ β1xi+ ˆβ0−βˆ1x¯−βˆ0= ˆβ1(xi−x).¯ By taking the squared value and summing for all individuals:
n
X
i=1
(ˆyi−y)¯2= ˆβ1 2 n
X
i=1
(xi−x)¯ 2. (12)