• Aucun résultat trouvé

1.1 Data generating processes and regression models

1.1.2 Simulation study

While Figure 1.2 gives already an idea of what a PLM fit can provide using the data, the exact final outcome has to be verified in a simulation study. In order to reproduce the effect of having different type of population or say different link function S, as in Figure 1.1 we generate a random data set with:

Ii(age) = ≠8 + 0.1◊agei+ih; Wi(age) = 4.5≠0.03◊agei+iw; AgeU(20,80),

3We take ’relative wealth’, because people tend to compare their economic situation not with their own past situation but to look at their own present distribution quantile. See McBride[2001].

LS

Illness +

-LS

Age

-

+

Figure 1.2: Bias-transfer from the parametric to the non-parametric part when estimating the components of a PLM. Solid line indicate the true marginal impacts, dashed lines the expected PLM estimates. Left panel: bias when illness is forced to have a linear impact.

Right panel: bias transfer coming from the left panel when age is positively correlated with illness.

of sizen = 1500, whereh andw are i.i.d. error terms with standard normal distribution.

Then the latent variable ,Ui is built using equation (1.4). To transform these values into individual responses we choose four different inverse linksa, that corresponds to the cases in Figure 1.1. In each case this inverse link assigns to the elements of a the equidistant responses {1,2,3, . . . ,9,10},see Figure 1.3.

• Neutral population: a={1.0,2.0,3.0, . . . ,9.0,10.0},

• Extreme averse population: a={1.0,1.4,1.9,2.8,4.2,6.8,8.2,9.1,9.6,10.0},

• Optimists population: a={1.0,1.2,1.6,2.1,2.7,3.6,4.9,6.4,8.2,10.0},

• Pessimistic population: a={1.0,2.8,4.6,6.2,7.3,8.2,8.8,9.3,9.7,10.0}.

We then fit the PLM in equation (1.1) to the data using cubic regression spline basis.

Recall that this was proposed in literature to solve the problem of contradictory outcomes on the curvature as to the age effect on LS. The resulting outcome is shown in Figure 1.4. The interesting point of these simulations is that the different impacts that are found in the literature, namely the linearly decreasing, u-shape, inverse u-shape and most important the one with the hollow of mid-life crisis can be find in a the same scenario, depending on the true link function.

Looking closer at the second case in Figure 1.4 the graph depicts what is referred to as the ’mid-life crisis’, the change in the curvature of the overall LS around the age of 50.

However, in our simulation this outcome is nothing but the transferred bias caused by the misspecification of the parametric part i.e. neglecting the extreme aversion in observed responses.

To stop and conclude here would result in a negative output and somehow destructive criticism. In order to turn it into a more constructive contribution that does not limit to pointing out a misunderstanding of semi-parametric methods and misleading conclusions, we discuss in the next section how such an empirical study could be completed. We do not speak of ’solving the problem’ because as long as the true link function is unknown, the safest way of studying the marginal impact of age on LS (conditional on health or other factors) seems to be the use of purely non-parametric methods. However, if the objective of selectingG(·) is to correct for situations like extreme aversion, i.e. something

Figure 1.3: Simulated link function

that has nothing to do with the perceived utility but how LS is recorded (one might think of a measurement error that is systematically positive for low utilities but negative for large ones), then even the non-parametric methods cannot help. In such a case, identifi-cation requires the imposition of (additional) model assumptions which should be based on studies that investigate this potential measurement error, i.e. the response behavior of people when facing discrete bounded scales. For instance, Mullainathan and Bertrand [2001] have pointed out the necessity of studying people’s attitude toward the subjective and categorical survey questions; see also Krueger and Schkade [2008] regarding the reli-ability of the self-reported subjective well-being or LS. Later Frijters and Beatton [2012]

and Kassenboehmer and Haisken-DeNew [2012] have also pointed out to the problem of controversial results for functional form of LS in age but they have proposed in different ways to solve it by including the panel fixed effect in the model to account for differences in the individual behavior. With the same line of reasoning Wooden and Li [2014] argue that availability of panel structure is necessary for the study of subjective well-being (life-satisfaction). Chadi [2013] provides evidence on the effect of interviewer encounters on the response in the panel studies. They suggest that in the studies of life-satisfaction the interviewer fixed effect must be considered in the model. We argue that these proposed solutions can all be regarded as robustness check in the analysis of life-satisfaction but does not guarantee an unbiased estimation for the functional form of life-satisfaction in age. Main reason is that the misspecification in the link function and or the index is not separable unless additional assumption (restrictions) are imposed.

While focusing on the semi-parametric specification of the life-satisfaction models and letting age to behave non-parametrically, it is also to be mentioned the role of bandwidth selection. It is true that the choice of bandwidth for each non-parametric components of

Figure 1.4: Results of the fitted PLM model, in cases where each of the four links is the true one.

the model determines its degree of smoothness but only to this extent its functional form, i.e. it can make a slope flatter but not change the direction. Typically, these choices are made by optimizing the trade-off between the variance and bias of the estimator; in contrast, at least to our knowledge, there does nothing exist such a thing like a bandwidth choice procedure to prevent a bias transfer between the components of a semi-parametric model. In all the analysis of this chapter the the General Cross Validation method (GCV) is used to choose the optimum bandwidths for the non-parametric components.