• Aucun résultat trouvé

Spatial autocorrelation: robustness of measures and tests

N/A
N/A
Protected

Academic year: 2021

Partager "Spatial autocorrelation: robustness of measures and tests"

Copied!
60
0
0

Texte intégral

(1)

robustness of measures and tests

Marie Ernst and Gentiane Haesbroeck

University of Liege

(2)

Spatial data :

• geographical positions

• non spatial attributes

Examplea

Mean price of land for sales (e/m2) in Belgian

municipalities

a

(3)

Spatial data :

• geographical positions

• non spatial attributes

Examplea

Mean price of land for sales (e/m2) in Walloon

municipalities

a

(4)

Positive spatial autocorrelation Negative spatial autocorrelation No spatial autocorrelation Weighting matrix W • Binary weights, • Row-standardized, • Globally standardized, . . .

Convention: zero diagonal and S0 =Pi

P

(5)

Positive spatial autocorrelation Negative spatial autocorrelation No spatial autocorrelation Weighting matrix W • Binary weights, • Row-standardized, • Globally standardized, . . .

Convention: zero diagonal and S0 =Pi

P

(6)
(7)

The values z1, . . . , zn are observed at locations s1, . . . , sn Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2

(8)

Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2

(9)

The values z1, . . . , zn are observed at locations s1, . . . , sn Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2 Geary’s ratio (1954) c(z) = n − 1 2S0 Pn i =1 Pn j =1wij(zi− zj)2 Pn i =1(zi− ¯z)2

(10)

Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2 Geary’s ratio (1954) c(z) = n − 1 2S0 Pn i =1 Pn j =1wij(zi− zj)2 Pn i =1(zi− ¯z)2

(11)

The values z1, . . . , zn are observed at locations s1, . . . , sn Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2 Geary’s ratio (1954) c(z) = n − 1 2S0 Pn i =1 Pn j =1wij(zi− zj)2 Pn i =1(zi− ¯z)2

Getis and Ord’s statistics (1992)

G (z) = Pn i =1 Pn j =1wijzizj Pn i =1 Pn j =1,j 6=izizj

(12)

Moran’s Index (1950) I (z) = n S0 Pn i =1 Pn j =1wij(zi − ¯z)(zj − ¯z) Pn i =1(zi − ¯z)2 Geary’s ratio (1954) c(z) = n − 1 2S0 Pn i =1 Pn j =1wij(zi− zj)2 Pn i =1(zi− ¯z)2

Getis and Ord’s statistics (1992)

G (z) = Pn i =1 Pn j =1wijzizj Pn i =1 Pn j =1,j 6=izizj

(13)

Tests based on asymptotic normality

Without spatial autocorrelation, I , c and G are asymptotically Gaussian under normality (N) and/or randomisation (R) assumption.

Package spdep in R:

• moran.test: under N and R assumption

• geary.test: under N and R assumption

• globalG.test: under R assumption

Example

Statistic Expected value Variance p-value I = 0.53 -0.0038 0.0006 < 2 × 10−16

c = 0.42 1 0.0008 < 2 × 10−16

(14)

Without spatial autocorrelation, I , c and G are asymptotically Gaussian under normality (N) and/or randomisation (R) assumption.

Package spdep in R:

• moran.test: under N and R assumption

• geary.test: under N and R assumption

• globalG.test: under R assumption

Example

Statistic Expected value Variance p-value I = 0.53 -0.0038 0.0006 < 2 × 10−16

c = 0.42 1 0.0008 < 2 × 10−16

(15)

Tests based on asymptotic normality

Without spatial autocorrelation, I , c and G are asymptotically Gaussian under normality (N) and/or randomisation (R) assumption.

Package spdep in R:

• moran.test: under N and R assumption

• geary.test: under N and R assumption

• globalG.test: under R assumption

Example

Statistic Expected value Variance p-value I = 0.53 -0.0038 0.0006 < 2 × 10−16

c = 0.42 1 0.0008 < 2 × 10−16

(16)

Computation of the index using random permutations of the variable on the spatial domain.

(17)

Permutation tests

Computation of the index using random permutations of the variable on the spatial domain.

1 2 3

4 5 6

⇒ 2 5 6

(18)

Computation of the index using random permutations of the variable on the spatial domain.

1 2 3

4 5 6

⇒ 2 5 6

1 3 4

p-value = #{Ipermuted ≥ Iobs} #{simulations} + 1

(19)

Permutation tests

Computation of the index using random permutations of the variable on the spatial domain.

1 2 3

4 5 6

⇒ 2 5 6

1 3 4

p-value = #{Ipermuted ≥ Iobs} #{simulations} + 1

• Benefits: No assumption on the distribution

• Drawbacks : Simulations (randomness, computational cost,...)

(20)

Computation of the index using random permutations of the variable on the spatial domain.

p-value = #{Ipermuted ≥ Iobs} #{simulations} + 1

• Benefits: No assumption on the distribution

• Drawbacks : Simulations (randomness, computational cost,...)

Example (nsim=5000)

Statistic Observed rank p-value Package spdep in R:

I = 0.53 5001 2 × 10−4 moran.mc

c = 0.42 1 2 × 10−4 geary.mc

(21)

Dray’s test based on I

Decomposition of positive / negative influence on autocorrelation

I (z) = X I (uk)<E [I ] I (uk)cor2(uk, z) | {z } + X I (uk)>E [I ] I (uk)cor2(uk, z) | {z } SI−(z) SI+(z)

where uk are eigenvector of HWH, H = In−n1110.

• Simultaneous positive and negative spatial autocorrelation

• Permutation test

Example (nsim=5000)

Statistic Observed rank p-value I = 0.53 SI+= 0.57 5001 2 × 10−4

(22)

Decomposition of positive / negative influence on autocorrelation I (z) = X I (uk)<E [I ] I (uk)cor2(uk, z) | {z } + X I (uk)>E [I ] I (uk)cor2(uk, z) | {z } SI−(z) SI+(z)

where uk are eigenvector of HWH, H = In−n1110.

• Simultaneous positive and negative spatial autocorrelation

• Permutation test

Example (nsim=5000)

Statistic Observed rank p-value

I = 0.53 SI+= 0.57 5001 2 × 10−4

(23)

Moran’s I Geary’s c Getis and Ord’s G Test under R Test under R Test under R Test under N Test under N

Permutation test Permutation test Permutation test Dray’s test

(24)

Impact on the decision of the test

(25)

How contamination influences the result of the tests?

Impact on the decision of the test

Example Test p-value I under R < 2 × 10−16 I under N < 2 × 10−16 I Permutation < 2 × 10−4 Dray test < 2 × 10−4 c under R < 2 × 10−16 c under N < 2 × 10−16 c Permutation < 2 × 10−4 G under R < 2 × 10−16 G Permutation < 2 × 10−4

(26)

Impact on the decision of the test Example Test Decision I under R RH0 I under N RH0 I Permutation RH0 Dray test RH0 c under R RH0 c under N RH0 c Permutation RH0 G under R RH0 G Permutation RH0

(27)

How contamination influences the result of the tests?

Impact on the decision of the test

Example Paliseul: 690e/ m2 Test Decision I under R RH0 I under N RH0 I Permutation RH0 Dray test RH0 c under R RH0 c under N RH0 c Permutation FTR G under R RH0 G Permutation RH0

(28)

Impact on the decision of the test Example Paliseul: 730e/ m2 Test Decision I under R RH0 I under N RH0 I Permutation RH0 Dray test RH0 c under R FTR c under N RH0 c Permutation FTR G under R RH0 G Permutation RH0

(29)

How contamination influences the result of the tests?

Impact on the decision of the test

Example Paliseul ≈ max(Belgium) (1200e) Test Decision I under R FTR I under N RH0 I Permutation RH0 Dray test RH0 c under R FTR c under N FTR c Permutation FTR G under R RH0 G Permutation RH0

(30)

Impact on the decision of the test Example Paliseul: 1610e/ m2 Test Decision I under R FTR I under N FTR I Permutation FTR Dray test FTR c under R FTR c under N FTR c Permutation FTR G under R FTR G Permutation FTR

(31)

Resistance of a test (Ylvisaker, 1977)

• Resistance to acceptance: smallest proportion of fixed observations which will always implies the non rejection of H0. • Resistance to rejection: smallest proportion m0 of fixed

observations which will always implies the rejection of H0.

(32)

Impact on the p-value

Hairplot: ξ 7→ p-value(z + ξei) for i = 1, . . . , n

(33)

Impact on the p-value

Hairplot: ξ 7→ p-value(z + ξei) for i = 1, . . . , n

(34)

Impact on the p-value

Hairplot: ξ 7→ p-value(z + ξei) for i = 1, . . . , n

(35)

Impact on the p-value

Hairplot: ξ 7→ p-value(z + ξei) for i = 1, . . . , n

Geary

(36)

Moran can be interpreted as the slope in a LS regression of Wz over z

(37)

Moran scatterplot

Moran can be interpreted as the slope in a LS regression of Wz over z

(38)

Moran can be interpreted as the slope in a LS regression of Wz over z

Idea: Robustly estimate this slope

Robust techniques

• Least Trimmed Squares (Rousseeuw, 1984)

(39)

Regression: y = ax + b and residuals: ri = yi− ˆyi

Least squares regression Minimisation ofPn

i =1ri2

Least Trimmed Squares (Rousseeuw, 1984) Minimisation of the trimmed sum: Pn−h

i =1 r(i )2

M-estimator (Huber, 1973)

Minimisation of another function: Pn

i =1ρ ri ˆ σ  e.g., ρ(r ) = r2, ρ(r ) = |r |

(40)

Regression: y = ax + b and residuals: ri = yi− ˆyi

Least squares regression

Minimisation ofPn

i =1ri2

Least Trimmed Squares (Rousseeuw, 1984) Minimisation of the trimmed sum: Pn−h

i =1 r(i )2

M-estimator (Huber, 1973)

Minimisation of another function: Pn

i =1ρ ri ˆ σ  e.g., ρ(r ) = r2, ρ(r ) = |r |

(41)

Regression: y = ax + b and residuals: ri = yi− ˆyi

Least squares regression

Minimisation ofPn

i =1ri2

Least Trimmed Squares (Rousseeuw, 1984)

Minimisation of the trimmed sum: Pn−h

i =1 r(i )2

M-estimator (Huber, 1973)

Minimisation of another function: Pn

i =1ρ ri ˆ σ  e.g., ρ(r ) = r2, ρ(r ) = |r |

(42)

Regression: y = ax + b and residuals: ri = yi− ˆyi

Least squares regression

Minimisation ofPn

i =1ri2

Least Trimmed Squares (Rousseeuw, 1984)

Minimisation of the trimmed sum: Pn−h

i =1 r(i )2

M-estimator (Huber, 1973)

Minimisation of another function: Pn

i =1ρ ri ˆ σ  e.g., ρ(r ) = r2, ρ(r ) = |r |

(43)

Impact on the decision of the test: resistance

Linked to the Breakdown point of the robust regression Least Trimmed Squares M-estimator

h/n ≤ 0.5

(44)

Impact on the decision of the test: resistance

Linked to the Breakdown point of the robust regression Least Trimmed Squares M-estimator

h/n ≤ 0.5

(45)

Impact on the decision of the test: resistance

Linked to the Breakdown point of the robust regression Least Trimmed Squares M-estimator

h/n ≤ 0.5

Impact on the p-value: hairplot

(46)

Linked to the Breakdown point of the robust regression Least Trimmed Squares M-estimator

h/n ≤ 0.5

Impact on the p-value: hairplot

(47)

Model

The Spatial autoregressive (SAR) model: Z = ρWZ + ε

Simulation settings (Holmberg and Lundevaller, 2015)

• Spatial domain: grid (n = 100, 400, 900, 1600)

• Weighting matrix: queen and rook contiguity

• Spatial correlation coefficient: ρ = 0, 0.1, 0.2, 0.3

• Distribution of ε: N(0, 1), P(1), Bern(0.5)

Contamination process

(48)

Example

n = 100, queen contiguity, ρ = 0.2 and ε ∼ N(0, 1)

(49)

1. Without contamination (δ = 0)

Similar results are expected for classical and robust tests

2. With contamination (δ > 0)

(50)

Robust tests vs initial tests

Good correlation between the p-value of robust and classical Moran’s tests (≥ 90%) Level (n = 100) ε W Moran M-estimator LTS MAD Huber 75% 95% N(0,1) Queen 0.055 0.061 0.062 0.072 0.073 N(0,1) Rook 0.051 0.051 0.050 0.053 0.054 P(1) Queen 0.044 0.054 0.053 0.080 0.063 P(1) Rook 0.052 0.056 0.055 0.057 0.056 Bern(0.5) Queen 0.067 0.086 0.087 0.091 0.091 Bern(0.5) Rook 0.041 0.061 0.050 0.076 0.076

(51)

Power

(52)
(53)

500 simulations for n = 100, ε ∼ N(0, 1) Level δ W Moran M-estimator LTS MAD Huber 75% 95% 1 Queen 0.064 0.072 0.072 0.140 0.126 1 Rook 0.026 0.030 0.030 0.094 0.046 2 Queen 0.050 0.068 0.068 0.134 0.090 2 Rook 0.040 0.052 0.052 0.096 0.064 3 Queen 0.054 0.062 0.060 0.104 0.090 3 Rook 0.054 0.076 0.076 0.086 0.098 4 Queen 0.026 0.048 0.048 0.092 0.064 4 Rook 0.050 0.064 0.062 0.090 0.080 5 Queen 0.058 0.072 0.070 0.124 0.126 5 Rook 0.040 0.058 0.058 0.062 0.054 10 Queen 0.052 0.060 0.060 0.072 0.070 10 Rook 0.068 0.074 0.070 0.078 0.068

(54)

500 simulations for n = 100, ε ∼ N(0, 1)

(55)

500 simulations for n = 100, ε ∼ N(0, 1)

(56)

500 simulations for n = 100, ε ∼ N(0, 1)

(57)

500 simulations for n = 100, ε ∼ N(0, 1)

Preliminary observations

• Robust test using Least Trimmed Square gives good results

• Robust test using M-estimator is more sensitive to leverage points

(58)

In progress

• Simulations and comparison need to pursue

• Proposition of a robust test based on the others statistics

Perspective

(59)

In progress

• Simulations and comparison need to pursue

• Proposition of a robust test based on the others statistics For Geary’s c:

• γ(h) = Sˆ z2c for ˆγ(h) the empirical variogram

• Robust estimation of the variogram using Qn

• c ∝ˆ 12Qn(Vn)

Qn(z)

2

Perspective

(60)

• Geary (1954). The contiguity ratio and statistical mapping, The inc. Statistician. • Getis and Ord (1992). The analysis of spatial association by use of distance

statistics. Geographical analysis.

• Holmberg and Lundevaller (2015). A test for robust detection of residual spatial autocorrelation [...]. Spatial Statistics.

• Moran (1950). Notes on Continuous Stochastic Phenomena. Biometrika. Robustness:

• Genton (1998). Spatial BDP of variogram estimators. Math Geol.

• Genton and Lucas (2003). Comprehensive Definitions of Breakdown Points for Independent and Dependent Observations. J R Stat Society.

• Genton and Ruiz-Gazen (2010). Visualizing influential observations in dependent data. Journal of computational and graphical statistics.

• Huber (1973). Robust regression [...]. Annals of Statistics.

• Lark (2008). Some Results on the Spatial Breakdown Point of Robust Point Estimates of the Variogram. Mathematical geosciences.

• Rousseeuw (1984). Least Median of Squares Regression. J Am stat assoc. • Ylvisaker (1977). Test resistance. J Am stat assoc.

Références

Documents relatifs

En ce qui concerne la perception externe de la réussite scolaire, on observe uniquement un effet du type de population (EIP/typique), F(1-93)= 3,397, p&lt;.069 (tendance

review :ةمدقم يف ةعونتملاو اهنومضم يف ةفلتخملا تايرظنلا نم ديدعلا روهظ نمزلا ربع يعامتجلإا ريظنتلا فرع يف ةيعانصلا ةروثلا مايقو اسنرف يف

L’AUDPC est utilisée pour quantifier l’intensité de la maladie (Madden et al., 2007).les AUDPC faibles représentent une lente progression de la maladie et une

This distribution is not considered in a large number of classic methods based on spatially weighted means used in spatial analysis, such as spatial autocorrelation indices,

Sporadic meteors studied using the Arecibo Observatory have indicated the presence of a component that consists of very fast (≥50 km/s), very light (1 µg) meteoroids (Janches et

Summary - The allelic frequencies of haemoglobin, protein X, arylesterase, transferrin, carbonic anhydrase and albumin in 71 European ovine populations were studied

اره نأ امي جماهسب حرتهالإا يئادخبالا تظماخلا تىظلا رُملاج زحكدخل َلذ و تًدُهمخلا باػللأاب ءيلم ( 10 - 11 ) ؽاسقلإ تىط دُػب و مىشم بلان يف

Early after virus inoculation, they did not observe any difference in the viral titers measured in various organs in presence or absence of saliva whereas after 7 days of