• Aucun résultat trouvé

Life-Satisfaction in Germany: Empirical results

We first replicate the study of Wunder et al. [2013] to afterwards explore to what extent other model specifications would lead to different conclusions. That is, we are interested to see whether the conclusion of having detected the mid-life crisis is a by product of misspecification of the model and thus a transmission of the bias to the non-parametric part.

The data in hand is the German Socio-Economic panel (SOEP, hereafter). Regardless of 30 rounds of available data, as in Wunder et al. [2013], we use only the data from 1986 to 2007 without the years 1990 and 1993 for which the number of nights in the hospital was not registered. The response of interest is the “Overall Life-Satisfaction” (LS). That is a discrete categorical response, evaluated by the respondents on a scale from 0 to 10 where ‘0’ corresponds to ‘completely dissatisfied’ and ‘10’ to ‘completely satisfied’. A histogram of the responses polling all the data for the course of this study is shown in Figure 1.5; it illustrates that the unconditional distribution of the responses is strongly

skewed to the left having its mode at 8.

Figure 1.5: Observed responses to “overall life satisfaction”

The main interest is to estimate the marginal impact of age on LS where age ranges from 16 to 100 with a mean around 46 . The other covariates included are personal char-acteristics that typically are used in similar studies; such as illness (nights in hospital, disabilities), relative wealth (household income and size, education, nationality, employ-ment status), and family situation. We first fit a PLM as in (1.1), i.e. where the link function is considered to be the identity. The graph in Figure 1.6 presents then the marginal effect of age on LS. A diagnostic check on this model is presented in Figure A.1 (left panel) in the Appendix A; it indicates problems at the tails, maybe due to extreme aversion.

Figure 1.6: Life-Satisfaction over the life span(age) in Germany using PLM model.

Due to our discussions, theoretical derivations and simulation results in the former sections one may be tempted to directly disregard this outcome as a bias-effect transmitted from ignoring the extreme aversion of respondents, or from potential non-linearities in the index. Fortunately, nowadays, it is no longer that hard to check out if a more flexible modeling will produce a different outcome. In the next section we explain a general class of models that we used to preform a robust check on the functional form of the Life Satisfaction in age for different model specifications.

1.2.1 General class of models

To be more specific, let us group the covariates in two sets, namely one comprising the dummy variables, D, and one comprising the others, X, and consider

E[LS|Age, D, X] =G1m(Age) +mx(X) +mI(X, Age) +DT2, (1.5) with mx to be either non-parametric or linear, andmI a non-parametric interaction term (in some specifications set to zero). Inversely to the flexibility we concede to our index function ÷(·), we can allow for flexibility in the link function G.

We are aware of the fact that there exist many more model specifications than those comprised in (1.5) like e.g. generalized varying coefficient models, for a review see e.g.

Roca-Pardinas and Sperlich [2007]. But the objective is not to find the optimal data fit or a model specification for which you can no longer replicate the shape of Figure 1.6;

the aim is to check whether intuitively reasonable modifications of (1.1) make the hollow, interpreted as ’mid-life crises’, disappear.In this respect we fit a series of plausible models as a robust check for potential misspecification.

The special cases arise from equation (1.5), can be seen as follow 4: Generalized Linear Model (GLM):

Using equation (1.5) we can also express a fully parametric model. Here we consider a GLM model with G being an arbitrary monotone link, i.e. Ordered Logit Model, m(Age) = q3j=0jAgej with unknown j and mx a linear function of the continues co-variates. In this case mI(.) is explicitly set to zero.

Partial Linear Model (PLM):

In the class of semi-parametric models the PLM is when m(Age) is estimated non-parametrically, mI(.) is set to zero, the mx(.) is restricted to be linear in X and the G is an identity function.

Generalized Partial Linear Model (GPLM):

As in PLM we consider that m(Age) is smooth function and the rest of the covariates are linear with no interaction considered. However, in this case G is no more identity. We consider a Poisson link function on the Life Dissatisfaction (instead of Life Satisfaction) as dependent variable. Later we justify more precisely this choice in Section 1.2.4.

Additive Models (AM):

G is again set to identity, however, the linear model is now a sum of smooth functions of the covariates. That is not just m(Age) is smooth function but also mx is the sum of smooth functions for each continues covariates. To fit such model one can represent each smooth function as a linear combination of the orthogonal bases and the fitting algorithm will estimate the coefficient of each of the basis. In the original setting of the AM we consider there is no interaction. That is to say mI(.) = 0. Later we also consider the presence of interactions between covariates by modeling also the interaction of Age with all the other continues variables. To do so, one need to use the Tensor product bases. 5

Generalized Additive Models (GAM):

Generalized Additive Models , follows the AM in a sense that the index is still the sum of smooth functions of the covariates, however, the response may follow any exponential

4For the sake of consistency and coherence all the semi-parametric models that are introduced in this chapter are fit by using the function gam() from mgcv package in R. Only the ordered logit model is fit by using the polr() from MASS.

5The estimation of the non-parametric components in this section is done by using cubic regression spline. The smoothing parameter or the number of knots is chosen using Generalized Cross Validation (GCV).

family distribution. To fit the model a penalized-iteratively re-weighted least square algorithm must be used. For equation (1.5), that means the m(Age) and mx() are a smooth function and a sum of smooth functions respectively. In our settings we consider mI(.) to be zero. Gis now a known link function. Like the case in GPLM, we consider a Poisson link on the mirror of the responses.

The strategy for the remainder of this chapter is to run the following models: a semi-parametric Additive Model (AM), then one enriched by interaction terms, a Generalized Additive Model (GAM) with pre-specified link function, and finally a GLM with arbitrary monotone link G.

1.2.2 Semi-parametric additive model

In a first step we set G=identity but give full flexibility to the impact of each covariate under the restriction of additive separability. As most covariates are dummy variables, we apply spline functions only to Age, nights in hospital (N inH), years of education (Y of Edu), log of net household income (LN HI), and log of household size (LHS) to estimate

E[LS |Age,· · · , D] =m(Age)+m1(N inH)+m2(Y of Edu)+m3(LN HI)+m4(LHS)+DT—. The estimate ofm(Age) is plotted in Figure 1.7. We can see that the curvature of the age effect persist is flatter in this setting but shows the same behaviour as in the simpler PLM model. The other estimates are in compliance with the existing literature, cf. Ferrer-I-Carbonell [2005], Frey and Stutzer [2002] and Di Tella et al. [2003]. – a clear downward slope for LHS.6

Figure 1.7: Life-Satisfaction over the life span(age) in Germany using Additive Model.

1.2.3 Semi-parametric additive model with interaction

The non-parametric additive 2nd order interaction model that we estimate is of the form E[LS |Age,· · · , D] =m(Age) +m1(N inH) +m2(Y of Edu) +m3(LN HI)

+m4(LHS) +m5(Age◊N inH) +· · ·+DT ,

6The family’s limited resources must be divided between members; an increase in the family size leads to a decrease in the available resources per capita and thereby to a downward trend in happiness.

where is included the interaction of all continuous variables with age. To fit this model we use the tensor product smoothers. A tensor product interaction within a non-parametric additive model is considered to be an appropriate choice when the main effects as well as interactive effects are present whereas thin-plate smoothers are recommended in cases where the one-dimensional functions are suppressed. The resulting estimate of m(Age) is given in the Figure 1.8, left panel. It can be seen that there is a change in the curvature of the marginal effect function m(Age), but the hollow around 50 is still clearly visible.

It could be argued that for this model, the marginal impact of interest is m(Age) +

s q

jm4+j(Age◊Xj)dFX(x) with FX(·) being the cumulative joint distribution of X’s;

but given the results for the purely additive model above, it is not surprising that this does not make disappear the valley around the age of 50, see Figure 1.8 (right panel).

Figure 1.8: Marginal effect of age in the additive model with interaction: m(age) on the left, and m(Age) +s qjm4+j(Age◊Xj)dFX(x) on the right.

An alternative interaction specification could be to let gender interact with all vari-ables. This is motivated by the various studies that suggest that the model of life satis-faction would differ a lot by gender. Figure 1.9 shows the outcomes form(Age) in a PLM as in (1.1) but separated by gender. It can be seen that indeed the valley is much deeper for men than for women. The shape, however, is basically the same.

Figure 1.9: Marginal effect of age in the PLM for samples separated by gender: m(age) for female on the left, and for men on the right, respectively.

1.2.4 GAM with Poisson link function for (dis-)satisfaction

The obvious alternative to relaxing index function ÷ is to allow for more flexibility in the link G. However, when considering a generalized PLM, i.e. model (1.5) with mx being a linear function, and mI © 0, then it is recommendable to parametrically specify the

link. Although there exist estimators even for generalized additive models with unknown (monotone) link G but they are numerically extremely unstable unless the sample size is huge and the impacts of all covariates are almost independent from each other. From Figure 1.5 one would guess that a mirrored Poisson distribution might do. When we fit a GAM with the Poisson link function on the mirror of the responses (the response is now the ‘life dissatisfaction’ defined by DS = 10≠ LS) we obtain for m(Age) the function plotted in Figure 1.10. And again, the resulting outcome could be interpreted as a justification of the existence of the mid-life crisis. Some diagnostic graphs of the model are shown in Figure A.1 (right panel) in the Appendix A. We observe now a much better behavior at the tails than for the original PLM. This indicates that the extreme aversion problem has been solved by this link. Interestingly, the mirrored m(Age) estimate has basically the same shape as we found for the PLM, cf. Figure 1.6, though the hollow (here bump) has slightly shifted towards 55.

Figure 1.10: Marginal effect of age using the Poisson link with mirrored responses: m(Age) on the left, and EX,D

5

G1m(Age) +mx(X) +DT26 on the right.

One could argue that the function of interest was ratherEX,D

5

G1m(age) +mx(X) +DT26. This, however, is not necessarily true as we discussed above: if extreme aversion is

not related to real life-satisfaction but just to reporting it, then G only corrects for the reporting error. And as we said, the question of whether it is a ’reporting error’ or not, is an identification problem that cannot be answered from these data.

1.2.5 An ordered logit model with cubic age-function

On ground of the above arguments we now fit an ordered logit model. The linkG is thus defined via the logistic distribution of the residual terms of the latent equation for utility Ui in model (1.3), and the unknown ak. As the ak can be freely chosen with the sole restriction ak≠1 Æak, monotone transformations are automatically captured. As for the impact of age, the estimator adapted a cubic function, see Figure 1.11 (left panel). Due to the parametric restriction we had to use here, the estimate cannot be the same as those we saw for the PLM or the GAM with Poisson link; but the main finding and conclusion is the same. Also here applies the potential criticism that the function of interest might rather be EX,D

5

G1m(Age) +mx(X) +DT26(with mx being linear); but this time the problem is even more complex because Geventually captures both, extreme aversion and functional (mis-)specification of the index ÷(·).

Figure 1.11: Estimates of ordered logit model: impact of age on LS on the left, and a study of the link on the right (along the estimated cut points).

The estimated cut points do neither correspond to the link function we considered in our simulation study, nor to the identity link function that is generally assumed in the related literature. These cut points actually suggest a link function that presents a left skewed conditional distribution of the responses. As said, this unfortunately has an interplay with the modeling of index function ÷. To provide a better ground for comparison we depict in Figure 1.11 (right panel) the identity link and used the estimated cut points (in the Ordered Logit model) as a transformation set (recall a for extreme averse population) to plot the corresponding link. This provides another evidence that the link function is different from identity: moreover, individuals in this data set might comply to the optimistic population group introduced in Figure 1.1.

At this point one can even go further by considering a non-parametric generalized ad-ditive model with an unknown link that is proposed byHorowitz[2001] as a less restrictive dimension reduction approach. However, this method requires a location, sign and scale normalization which makes more challenging to interpret the results. Besides this method is not implemented in the existing softwares, so is of little use to the practitioners.