Contribution à la modélisation en archéologie

1.4 Les modélisations pour la datation en archéologie :

1.4.3 Contribution à la modélisation en archéologie

Mes contributions sont les suivantes (Galharret et al., 2019a) :

— une modification de la loi a priori de Shrinkage de façon à obtenir des moments finis d’ordre p pour la loi a posteriori,

— une méthodologie d’estimation en deux étapes :

— la loi a posteriori de l’hyper-paramètre σi est exploitée pour détecter les don- nées aberrantes. En effet, en accord avec (1.20), σi donne une évaluation de la dispersion individuelle autour de l’âge central.

— L’âge est ré-estimé sur le sous-échantillon dont les outliers ont été exclus. A cette étape, il n’est donc pas nécessaire d’utiliser une méthode robuste.

J’ai testé cette approche dans le même contexte que l’Event Model (Lanos and Philippe (2017, 2018)) de calcul d’âge central. Ensuite, cette approche a été étendue à l’estimation de l’âge OSL.

Les résultats numériques mettent clairement en évidence que, même si l’estimation finale est réalisée sur un plus petit échantillon, on obtient un gain important en termes de biais et de variance en faveur de notre approche par rapport à la méthode robuste (Event model) .

Chapitre 2

Non-parametric Mediation Analysis

for direct effect with binary

outcomes

Abstract The main objective of mediation analysis is to decompose the causal effect of X on Y into components that can be explained by the mediator M (denoted by average causal mediation effect or average natural indirect effect or indirect effect) and that cannot be explained by M (average direct causal effect or average natural direct effect or direct effect). A general definition of average natural direct effect (NDE) based on counterfactual variables was introduced by Pearl (2001). In this paper, we propose a non-parametric procedure to test a stronger definition of absence of this average natural direct effect. We show that this definition is equivalent to the absence of average natural direct effect for all distinct values x, x∗ _{of X in most models of the literature. The simulations show that}

our approach performs better in non monotonic situations.

Keywords bootstrap, non-parametric regression, testing procedure, application to school- children’s well-being.

2.1 Introduction

Within the human sciences, mediation designates a particular causal phenomenon where the effect of a variable X on another variable Y passes (partially or entirely) through a third variable M (see Baron and Kenny (1986)). The study of mediation is particularly popular in psychology, sociology or marketing, as it allows the detection of variables that may trigger specific human behaviors.

In the mediation model, the total effect of X on Y is divided into the influence of X over Y in presence of M (the direct effect) and the part of this effect that reroutes through M (the indirect effect). For instance, Schmader and Johns (2003) have shown that a reduction in working memory capacity mediates the negative effect caused by a stereotype treat on women’s mathematical performances. MacKinnon (2008) compares testing procedures regarding the indirect effect.

X M Y a γ b

Figure 2.1 – Summary of the relations between Y, X, M . The direct and indirect effects are defined by γ and ab respectively, according to MacKinnon (2008)

The main objective in the mediation model is to quantify the added effect of X on Y in presence of M . A natural first step in this direction is to detect the absence of a direct effect altogether, which would signify that X could (and should) be ignored to investigate Y . Detecting the direct effect is generally achieved via a statistical test on the significance of the corresponding coefficient γ in the model. If Y is a continuous variable, the mediation model typically follows a classical linear regression framework :

Y = α + γX + bM + ε,

where is a random error uncorrelated to X and M , with zero mean and finite variance. In this model, testing whether there is a direct effect can be achieved by a Student signifi- cance test on the coefficient γ. Similarly, the logistic regression model provides a discrete analogue when Y is a categorical variable. The absence of a direct effect is then tested via the likelihood ratio test also called LR test (see e.g. Agresti, 2006) or via the Wald test

2.1. Introduction

(see Hauck Jr. and Donner, 1977, for example). Other models have been investigated in the literature such as log-linear, Poisson or negative Binomial see for instance Vander- Weele (2015).

In some situations, generalized linear models may fail to detect the dependence of the variables. This is true for instance if the conditional expectation function (also called re- gression function) ρ : (x, m) 7→ E(Y |X = x, M = m) is non-monotonic with respect to one or more variables. While other parametric models may account for the non-monotonicity, they still rely on structural assumptions that can be hard to check in practice. Purely non-parametric methods thus tend to be more robust in some situations, at the cost of a theoretically lower convergence rate. Non-monotonic behavior appear in many relation- ships as illustrated by the following examples. Tong et al. (2008) study the relationship between the social attractiveness and the number of Facebook friends. They show that the social attractiveness raises with increasing number of friends, but declines for a large number of friends. Kiviruusu et al. (2015) highlights the existence of a non-linear link between age and self-esteem. The same results are obtained in Erol and Orth (2011) who consider the trajectory of the Big Five personality traits as a function of age.

In Pearl (2001), the average natural direct effect is quantified for any two values x, x∗ of X in a way that is compatible with the non-parametric identification for a binary va- riable X under sequential ignorability seen in Imai and Keele (2010) . Since one side of the data is always missing when investigating the direct effect (the counterfactual data are, by nature, not observed), testing the absence of direct effect is impossible in a general framework. More generally, building tests on the conditional independence is known to be a difficult task, as discussed in Shah and Peters (2018). It is however feasible to test if the conditional expectation given X = x does not depend on x, which we refer to as strong absence of direct effect. This technically stronger condition turns out to be equivalent to the absence of direct effect for all distinct pairs (x, x∗), x 6= x∗ in most frameworks studied the literature, including linear models with or without interaction Hayes (2018), generalized linear models VanderWeele (2015), generalized additive models Wood (2006) or multiplicative models. Unlike the natural direct effect, the strong absence of direct effect can be detected from the data in a non-parametric setting.

We present a non-parametric test procedure to infer on the strong absence of a direct effect in the general mediation model. The test statistic are obtained from the Nadaraya-

Watson estimators of Y conditionally to (X, M ) and M . Although the theoretical distri- bution of the test statistic under the null hypothesis is unknown, it is possible to approach it by a bootstrap procedure, thus providing an approximation of the p-value. A real data application to students’ performances linked to well-being and self-efficacy is presented. We show that the conclusions regarding the existence of a strong direct effect may dif- fer, whether the considered model is linear (in this case, the logistic regression model) or not. A comparative study of the two tests procedures is carried out on simulated data in both a linear and non-linear framework. This study reveals that the logistic model may misread the causal effect in the data if the linearity assumption is not satisfied, and more particularly in absence of monotonic effect of X on Y . On the contrary, the performances of the non-linear test procedure remain comparable to the parametric tests in the logistic regression setting. We note also that the comparison between both parametric tests is in favor of the LR test in terms of power for small samples, in agreement with the published literature (see e.g. Harrell, 2006) .

The paper is organized as follows. In Section 2.2 we describe the mathematical forma- lism behind the non-linear mediation model, whose definition relies on the joint distribu- tion of the variables. We show that this model effectively generalizes the linear mediation model. Indeed, a direct effect in a linear scenario results in a direct effect in the general setting, while the reciprocal may not be true. The extension of the significance test for a direct effect to the non-linear setting is then developed. Finally, the statistical hypothesis testing is applied to numerical examples in Section 2.4, both on simulated and real data.

Dans le document Statistique bayésienne appliquée en sciences humaines et sociales. (Page 36-41)