• Aucun résultat trouvé

In the following Section, we show that the conditional expectation of the numerator and denominator of the F statistic can be computed analytically for the manly, kennedy, freedman lane and terBraak method of Table 1.1. Cox and Hinkley (1979) already proposed computing moments of the permutation distribution when testing linear depen-dency between two variables. The results we present in the next Section allow to identify the problem with the kennedy permutation method when using the F statistic as it is well known that the kennedy method increases the type I error rate in finite sample size (Anderson and Legendre, 1999). Moreover, they imply a natural correction for the kennedy method. In addition, computing conditional expectations gives a good insight why the manly performs well in simulation studies despite not reducing the effect of nuisance variables. Finally, using a simulation study, we highlight the problem of the kennedy method and show the improvement of the type I error rate using this natural correction.

Giveny, the equation of the model 3.1 is:

Y|y =y=++e, (3.7)

where y is the observed realisation of Y and e is the unknown realisation of .

3.3. Finite Sample Conditional Distribution by Permutation 55 Lemma 1. For the linear model of Equation 3.1, over all permutations, the conditional expectation of the numerator and denominator of theF statistic given the observation of y are summarized in the following Table.

Manly Freedman-Lane terBraak Kennedy Numerator n−11 y>R1y n−11 y>RDy n−11 y>RD,Xy n−11 y>RDy Denominator n−11 y>R1y n−11 y>RDy n−11 y>RD,Xy n−11 n−p+qn−p y>RDy

The conditional expectation in Lemma 1 are equal for the numerator and denominator using the manly, freedman lane and terBraak methods (like the parametric distri-bution under normality and homoscedasticity assumptions). However, for the kennedy method, the conditional expectation of the denominator is larger than the one in the numerator. Note that y>R1yis the sum of squares of the empirical variance of the vector y and y>RDy, y>RD,Xy are the sum of squares of the residuals of the ”small” and ”full”

model, respectively. Loosely speaking, manly inflates the numerator compared to the parametric expectation but this is compensated by the same inflation in the denomina-tor. This is formalized in Section 3.4.

3.3.1 Proof of Lemma 1 for the manly Permutation Method

Given the observation y of the response variable in Equation 3.1 and the set of all n!

permutation matrices of size n×n: P ={P1, . . . , Pn!}, the manly permutation method is summarized by the transformation of the data {y, D, X} → {P y, D, X}. Using this transformation, we define the conditional multivariate distribution by permutation given the observation y by assigning the same probability to all permutations of the vector y:

Pr ((Y|y) = P y) = 1

n! ∀P ∈ P. (3.8)

Then, the conditional expectation over all permutations ofY|y is simply:

EP[Y|y] = X Moreover, by using the result of Equation C.2, we also compute its conditional variance over all permutations: Finally, using the definition of the conditional distribution by permutation in Equa-tion 3.8 and the property derived in EquaEqua-tion C.3 of Appendix C.1.1, we compute the conditional expectation of the numerator of the F statistic:

EP

Table 3.1: Type I error rate of tests of one or two parameters simultaneously (column pq) for a regression models with 2 different sample sizes (column n). If the ratio n−p+qn−p increases (column ratio), the discrepancy to the nominal level of the type I error of the kennedy method also increases. Correcting the kennedy method by the inverse of the ratio produces a type I error rate closer to the nominal level ( Ken. Corr.); like the manly, terBraak and freedman lane methods (and the parametric test). Confidences interval are computed using Agresti and Coull (1998). Bold font corresponds to nominal level (5%) within the confidence interval and red font corresponds to confidence interval above the nominal level.

n p-q ratio Parametric Manly Freedman-Lane terBraak Kennedy Ken. Corr.

10 1 1.8 .045 .043 .047 .045 .121 .053

[.039;.052] [.037;.050] [.041;.054] [.039;.052] [.111;.132] [.046;.060]

10 2 1.6 .050 .048 .050 .050 .119 .055

[.043;.057] [.042;.055] [.044;.058] [.043;.057] [.109;.129] [.048;.063]

20 1 1.27 .048 .047 .048 .049 .073 .046

[.042;.056] [.041;.054] [.042;.056] [.043;.056] [.066;.082] [.040;.054]

20 2 1.2 .050 .050 .051 .051 .069 .048

[.044;.058] [.044;.058] [.045;.059] [.044;.058] [.062;.077] [.042;.055]

and, for its denominator:

EP

"

1

npY∗>RD,XY|y

#

= 1

n−1y>R1y. (3.12) In Appendix C.4 we derive the conditional distribution by permutation of the numera-tor and denominanumera-tor for the kennedy, freedman laneand terBraakwhich are reported in Lemma 1.

3.3.2 A Modification of the kennedy Method

For the manly, freedman lane and terBraakmethods, both numerator and denomina-tor of the F statistic have the same expectation, similarly to the parametric setting. For the kennedy method, the numerator and the denominator do not share the same con-ditional expectation and we expect that it could lead to incorrect type I error for small sample. However, we can correct the conditional distribution in order to have same ex-pectation by dividing the permuted statistics by the factor n−p+qn−p (but letting unchanged the observed statistic). We use a simulation study to test the effect of the correction of the kennedy method.

Extensive simulations for several permutation methods, under several conditions have already been proposed. Anderson and Legendre(1999) propose simulation study showing the type I error rate of the kennedy, freedman laneand terBraakmethods and already highlight the problems using the kennedy method. Kherad Pajouh and Renaud (2010) add the huh jhun method to their simulation study and shows the type I error rate under several distributions of the error terms. They show that huh jhunmethod keeps a type I error rate close to the nominal level even under non-normality. Moreover, Winkler et al. (2014) test, in addition, the manly and dekker (noted ”Smith” and attributed to O’Gorman (2005) and Nichols et al. (2008)) permutation method, and also use the F

3.4. Asymptotic of the Conditional Distribution by Permutation 57