Practical cases and issues related to model fitting

(1)

HAL Id: hal-02803674

https://hal.inrae.fr/hal-02803674

Submitted on 5 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Practical cases and issues related to model fitting

Nicolas Picard, Laurent Saint-André, Matieu Henry

To cite this version:

Nicolas Picard, Laurent Saint-André, Matieu Henry. Practical cases and issues related to model fitting. National technical staff training on allometric equations (ae) under the REDD Programme, UN-Reducing Emissions from Deforestation and forest Degradation (UN-REDD). Genève, CHE.; Food and Agriculture Organization (FAO). ITA.; Programme des Nations Unies pour l’Environnement (UNEP). FRA.; Institut National de la Recherche Agronomique (INRA). FRA.; Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD). FRA., Jun 2012, Hanoi, Vietnam. 35 p. �hal-02803674�

(2)

Practical cases and

issues related to model

fitting

Pham Cuong, Inoguchi Akiko

Hanoi, June 18 - 22

th

₂₀₁₂

Authors : N. Picard (CIRAD), L. Saint-André

(CIRAD - INRA), and M. Henry (FAO)

(3)

Step by Step

Exploratory stage, getting a model for each compartment and each strata (local model)



What variable is to be used as input data ? Or what combination of variables is to be used as in put data?

What is the form of the relationship with each of the variable ?

What are the relationships between the parameters of the local models and the strata characteristics ?

What is the form of this relationship for each parameter ?

Fitting of the complete model: one system of equations for all compartments and all strata



Aggregation stage, getting a model for each compartment, all strata pooled together (global model)

(4)

Linear Models

Linear regression: Principle i i i

a

b

X

Y

=

+

.

+

ε

Fitting this equation consist in estimating parameters a and b.

Usually, we use the least squared method which consist in finding parameters a and b that minimized the sum of squared errors :

The model is written as following: model the of parameters the are b and a model, by the explained not variation residual the is i ε

∑

= = + − = n i i i n i i Y a b X 1 2 1 2 ) . ( ε Y X



(5)

Linear Models

      − = − − − =

∑

_ _ _ 2 _ _ . ) ( ) ).( ( X b Y a X X X X Y Y b i i i

a and b are two random variables

The covariance between a and b is not null (meaning that parameters of a given equation are non-independent) An unbiased estimation of this covariance is given by :

2 ) . ( 2 2 − + − =

∑

n X b a Y s i i

The standard deviation of a and b is given by :                   − + = − =

∑

_ 2 _ 2 2 _ 2 2 ) ( 1 ) ( ) ( ) ( X X X n s a ect X X s b ect i i

And their confidence interval by :

   − ± − ± ) ( ). 2 / , 2 ( ) ( ). 2 / , 2 ( a ext p n t a b ect p n t b

Usually p=0.05 to get the

parameter value at level 95% of confidence

Linear regression: Principe

(6)

Linear Models



Dep Var: Y N: 13 Multiple R: 0.828991 Squared multiple R: 0.687225 Adjusted squared multiple R: 0.658791 Standard error of estimate: 0.351724 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT 0.063555 0.378512 0.000000 . 0.16791 0.86970 X 0.704030 0.143206 0.828991 1.000000 4.91621 0.00046 Effect Coefficient Lower < 95%> Upper

CONSTANT 0.063555 -0.769544 0.896655 X 0.704030 0.388836 1.019223 Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P Regression 2.989959 1 2.989959 24.169106 0.000460 Residual 1.360810 11 0.123710

---Coefficient of correlation R2, the adjusted one is better

Values of parameters a = intercept

b = parameter linked to X

Standard deviations of parameters Confidence intervals of parameters

∑

− _ 2 ) ˆ (Yi Y

∑

= n i i 1 2 ε Linear regression: Analysis of variance



(7)

Linear Models

Can we do a linear regression ?



Y

X

(8)

Linear Models

Y X



Ln Y Ln X



Y’=Ln Y X’=Ln X ?

(9)

Linear Models



Y X



Y’=Ln Y X’=Ln X ? Ln Y Ln X

(10)

Linear Models



Y X





Y’=Ln Y X’= X ? Ln Y X

(11)

Linear Models

Y X





X’=X Y’=ln(Y/(1-Y)) ? ln(Y/(1-Y)) X

(12)

Linear Models

Why transforming the data ?





Power equation :

Y

=

a

X

b

exp(

ε

)

ε

+

=

ln(

)

ln

Y

a

'

b

X



Exponential model :

Y

=

a

exp(

b

X

+

ε

)

ε

+

=

a

b

X

Y

'

ln



It is always interesting to get a linear relationship because the solution is explicit



And sometimes it permits also to stabilize the variance

But, this is not always possible and it may not correspond to the data set…. So try and see !

(13)

Linear Models

The following equations are linear or can be transformed to get a linear equation ?





Y

=

b

X

+

ε



Y

=

b

X

ε



Y

=

b

X

exp(

ε

)



_Y

=

_b

_X

2

+

ε



Y

=

X

b

exp(

ε

)



_Y

=

_X

b

+

ε



Y

=

b

X

+

c

X

2

+

ε



Yes but two highly correlated variables



lnY = lnbXε =lnb+ln X +ln(ε)

(14)

Non - Linear Models

Non-Linear regression: Principle



For linear models, the solution is explicit because the derivative of the model toward each parameter is independent from the paramameters of the equation.

For non-linear models, it is not the case: the derivatives depend on the parameters. The resolution of the system is too much difficult. It is then necessary to use alternative methods.

ε

α

β

₊

=

_e

X

Y

_.

. ₂ 1 . ) . (

∑

₌ − = n i X i res i e Y SS

α

β 0 ) )( . ( . 1 . ₋ ₌ −

∑

₌ i bXi n i X b i a e e Y 0 ) . . )( . ( . 1 . − = −

∑

₌ i X b n i X b i a e a e X Y i i

(15)

Non - Linear Models

To fit a non-linear model, it is necessary to proceed by iterations.

When the least square method is used, at each step (i.e. each estimation of a new set of parameters) the sum of squared errors is calculated. If the procedure is efficient, this SSE decrease at each step. At the end of the process, if this decrease is negligible, then it is said that the model converged.

The most used iterative procedure is the Gauss-Newton one. But a lot of other procedures are available. When there are problems in fitting a model, it is recommended to test several methods (ex: fractionnal iteration,

Marquardt)

Meaning that we have to give initial values to the parameters

(16)

Non - Linear Models



b1 b2 b2 final b1 final 100 80 60 40

Sum of squared errors – isovalue curves b2 initial b1 initial Successive iterations to get the final values of b1 and b2

Graphical view of the model

convergence with two parameters

(17)

Non - Linear Models



b1 b2 100 60 2 0 40 60

With these initial values, we fall in a local minimum of SSE (>20 et <40)

With these initial values, we fall in the absolute minimum of SSE(<20)

It is strongly recommended to test several sets of initial values

Importance of the initial values given to the parameters

(18)

Non - Linear Models



Dependent variable is Y

Source Sum-of-Squares df Mean-Square Regression 1.79138E+04 3 5971.269443 Residual 6.711670 20 0.335583 Total 1.79205E+04 23

Mean corrected 3284.344348 22

Raw R-square (1-Residual/Total) = 0.999625 Mean corrected R-square (1-Residual/Corrected) = 0.997956 R(observed vs predicted) square = 0.997965

Wald Confidence Interval Parameter Estimate A.S.E. Param/ASE Lower < 95%> Upper B1 40.269815 0.584758 68.865757 39.050031 41.489600 B2 0.029815 0.001760 16.941467 0.026144 0.033486 B3 1.454754 0.078017 18.646595 1.292013 1.617495

Asymptotic Correlation Matrix of Parameters

B1 B2 B3

B1 1.000000

B2 -0.910171 1.000000

B3 -0.756906 0.939698 1.000000

R2, the mean corrected one should be used

Values of the parameters and their confidence intervals

Correlation matrix between parameters

Non Linear regression:

(19)

Goodness of fit

How to assess the goodness of fit (for linear and non-linear models)



 _{R2 and graph Y=f(Ypredit)}

R2 is an index of fit, to be used cautiously, (see thereafter) Maximum value = 1; Minimum value = 0

 Values of the parameters and their confidence interval

Identifying problems of convergence; usually the standard error should not exceed 10% of the parameter value

 Correlations between parameters

If correlations are too high, transformation of the variables or change the model equation

 The RMSE (Root Mean Square Error, or residual standard error)

Gives the error dispersion; to be compared to the average measured values. Usually, the model is satisfactory when the RMSE is less than 10% of the measured values

 Error distribution et relationship with the input variables

Errors should be normally distributed, with no heteroscedasticity and no autocorrelation; Errors should be un-correlated with the input variables

(20)

Goodness of fit



Example of normally distributed errors, to be verified with statistical tests (ex D’agostino et al, 1990) and quantile plots

(21)

Goodness of fit



Example errors with

(22)

Goodness of fit



Error of a linear model when the appropriate model is in fact

(23)

Goodness of fit



Do not listen to the siren’s song of the R2 _!

(24)

Heteroscedasticity

How to deal with heteroscedasticity ?



Y X

ε

avec

.

+

=

a

b

X

Y

∼

N

(

0 ,

σ

X

)

Transformation of the variables,

X

Y

'

₌

/

_X

'

₌

₁

_/

_X

ε

'

₌

ε

_/

_X

to get the following linear model :

' '

'

₌

_a

_X

₊

_b

₊

ε

_avec

ε

Y

∼

_N

₍

₀

_,

σ

₎

(25)

Heteroscedasticity



Y

X

More generally, the weighted regression consist in minimizing :

∑

=

−

n i ipredit i i

Y

w

1 2

)

(

avec

1 /

2 i i

w

=

σ

le weight of observation i Usually, we use z i i

∝

X

σ

All the challenge consist in finding the appropriate z value

z =Zoptimum z < Zoptimum z > Zoptimum

(26)

Heteroscedasticity



First option: a rough and simple method that can be used if there are enough data

z

i i X w ∝ 1₂



step 1 = split the variable X into k classes centered on X_k

Step 2 = calculate the variance σ_k2 of Y within each k classes

Step 3 = linear regression of logσ_k to logX_k

(27)

Heteroscedasticity



z i i X w ∝ 1₂ 

Step 1 = fitting the weighted model by fixing z to a given value (often 0 at the beginning) Step 2 = calculate the Furnival index (FI)

The optimum value for z corresponds to the minimum of the Furnival Index

Step 3 = back to step 1 by increasing z

( ) RMSE n X anti FI n i k k . log log 1             =

∑

=

(28)

Heteroscedasticity



z

i i X w ∝ 1₂

Fitting z with the other parameters of the model



( )

∑

_













+







 −

−

=

i z i i z i i i i

_X

X

Y

ML

log

2 .

log(

.

)

.

2

1

2 2. 2

σ

π

σ

µ

Fit by maximum likelihood instead of least squared methods

Model for the mean

(29)

Model Choice

For nested models: F test using the sum of squares errors (SSE) of the two models

1 2 1 1 1 2 p n SCE p p SCE SCE F T T T obs − − − = 2 1 p p >

if F_obs>F_tab, then model 1 is more suitable than model 2

F_tab(p1-p2,n-p1)

For non nested models: AIC, BIC using the maximum likelihood estimates



p ML

AIC = −2. +2. BIC = −2.ML+ p.log(n)

How to chose between models ?



If the number of parameters is the same between model 1 and model 2, then use the sum of square errors (SSE), the lowest SSE is the best



ε

+ + = a bD Y Nested in Y = a+bD +cD2H +

ε

+ + = a bD H Y 2 Nested in Y = a+bD +cD2H +

ε

+ + = a bD Y Non nested in Y = a+cD2H +

ε

If the number of parameters is different between model 1 and model 2, then Check if these two models are nested or not:

(30)

Model Choice

How to chose between models ?



2 D c D b a Y = + + 10 2 _.. _f _D D c D b a Y = + + + +

Don’t use the R2

because it increases automatically as and when the number of parameters increases 5 2 _.. _k_D D c D b a Y = + + + +

(31)

Step by Step



(32)

Aggregation

Example : Eucalyptus in Congo (Saint-André et al. 2005)



0 0.2 0.4 0.6 0.8 1 0 25 50 75 100 125 150 A g e (m o is ) 0 100 200 300 400 500 600 0 25 50 75 100 125 150 A g e (m o is ) ε + + =a bD H s LeafBiomas 2

No variation with stand age

Exponential decrease with stand age 0 1 2 3 4 5 6 0 0.5 1 1.5 2 2.5 D2H (m3) L e av e s B io m a ss ( kg D M t re e-1 ) DMLe ave s DMLe ave s Es t GP3A, 3B _GP3C, 3D GP1, 2 (11-30 months) (50-75 months) (135 months)

Fitted age by age, then analysis of the parameter variations with stand age

(33)

Aggregation

Compartiment Peuplements utilisés pour la calibration

Modèle pour l’espérance Modèle pour la variance F1 Total G1, G3A, G3B, G3D ₅_.₅₃ ₉₃₉_.₁₁( )_{r .}2_h 3 . 1 + = µ ₂₄_.₁₉( )2_. 0.483 3 . 1 h r = ε F2 Aérien G1, G2, G3A, G3B, G3C, G3D, V1 à V9 2.18 (488.8 2.2 )( )12.3. (0.87 0.0012 ) age h r age + + + = µ 21.74( )2. 0.613 3 . 1 h r = ε F3 Souterrain G1, G3A, G3B, G3D ₉_2.14( )2_. 0.630 3 . 1 h r = µ 5.52( )2. 0.385 3 . 1 h r = ε F4 Feuilles G1, G2, G3A, G3B, G3C, G3D, V1 à V9 0.64 (20.39 0.09age 2344.6e 0.15age)r12.3.h − + − + = µ 0.68( )2. 0.232 3 . 1 h r = ε

F5 Branches Mortes G1, G2, G3A, G3B, G3C,

G3D, V1 à V9 (6.12 158.9e 0.03age)r12.3.h − + = µ 3.09( )2. 0.353 3 . 1 h r = ε

F6 Branches Vivantes G1, G2, G3A, G3B, G3C,

G3D, V1 à V9 (31.12 4496.7e 0.18age)r12.3.h − + = µ 5.20( )2. 0.573 3 . 1 h r = ε F7 Ecorce G1, G2, G3A, G3B, G3C, G3D, V1 à V9 (25.95 19.83 0.05 )r12.3.h0.761 age e− + = µ 1.03( )2. 0.402 3 . 1 h r = ε F8 Tronc G1, G2, G3A, G3B, G3C, G3D, V1 à V9 0.29 (510.7 1.29age)r .h 2 3 . 1 + + = µ _36.02( )2_. 0.887 3 . 1 h r = ε F9 Souche G1, G3A, G3B, G3D ₃_7.72( )2_. 0.718 3 . 1 h r = µ 4.27( )2. 0.508 3 . 1 h r = ε

F10 Grosses Racines G1, G3A, G3B, G3D ( )₀_.₇₉₀ . 84 . 4 5 2 3 . 1 h r = µ 5.05( )2. 0.564 3 . 1 h r = ε

F11 Racines moyennes G1, G3A, G3B, G3D ( )₀_.₄₇₀

. 2.71 2 3 . 1 h r = µ 0.47( )2. 0.276 3 . 1 h r = ε

F12 Racines fines G1, G3A, G3B, G3D _{( )}₀_.₂₉₇

. 7.61 2 3 . 1 h r = µ 0.67( )r .2h 3 . 1 = ε

Age effect was significant for most of the compartments, we then get a set of equations that can be used whatever the stand age (within the range of the calibration data set 11 to 135 months)

Nb: the obtained z values (heteroscedasticity) are very different from 1 or 2 (classically used in the weighted regressions)

Example : Eucalyptus in Congo (Saint-André et al. 2005)

(34)

Aggregation

0 50 100 150 200 250 0 50 100 150 200 Eucalyptus -Congo Beech -France Eucalyptus -Brasil 0 50 100 150 200 250 0 50 100 150 Age (years) b ( ad im ) 0 50 100 150 200 250 0 50 100 150 200 Age (years) b ( ad im ) 0 100 200 300 400 500 600 700 0 50 100 150 200 Age (years) b ( ad im

) Not only eucalyptus and

fagus have the same pattern, they do also follow the same line ! (especially for stem wood and branches)

Example : Fagus in France (Genet et al. 2011)

(35)

Step by Step



(36)

Taking all compartments

into account

Equations were fitted altogether simultaneously, To take cross-compartment correlation into account. This step is important when one wants to simulate biomass estimates with confidence intervals



The output of SUR Regressions (Seemly unrelated regression) are :

1-Values of parameters and their confidence intervals

2-Correlation matrix of parameters (within compartment and between compartments)

3-Residual errors for each compartment

4-Correlation matrix of errors (between compartments)