• Aucun résultat trouvé

3. Data and methods

3.5 Covariates and complete profile analysis historical evolution

Sex, age-group and canton

For the COMP dataset the variable age-group contains five groups (65-69, 70-74, 75-79, 80-84, 85-94) and the variable canton only featured Geneva and Valais. The sample composition according to these variables can be found in part 3.2 where the COMP dataset is described.

Education

The variable education in each of the three waves captures the highest degree that a person has acquired at the moment when they were interviewed. Originally constructed as a six-level variable, I recoded this variable into three main levels: Low education, regrouping people with either no formal education plus those who only did the mandatory schooling, i.e. primary and secondary school; high, reuniting all those with any type of higher educational degree and finally an „average“ category which simply is made up by all people who did apprenticeships. The average category will serve as a baseline category in the binomial logit regression models (this type of statistical model is described in part 3.8). Table 17 gives an overview of the distribution of educational status for each of the waves that will be analyzed.

Low (%) Average(%) High (%)

1979 65 (67) 12 (11.5) 23 (11.5)

1994 47 (46.5) 16 (15) 37 (38.5)

2011 20 (18.8) 31 (31.7) 49 (49.5)

Table 17: Distribution of recoded educational levels 1979-2011 Source: COMP Note: Weighted values in brackets (adjusted for stratified sampling design), otherwise unweighted values

What is striking from this table (17)is how strongly it suggests a fundamental shift in the structural composition in terms of education between 1979 and 2011. In the observed period there has been a change that can be regarded almost as a complete reversal of the composition. On the lower end, the situation of departure was that 67% of all the people captured in the survey had a low level of education only. In 2011, there were only remained 18.8% of people with similar educational achievements. On the other end of the distribution, there were only 11.5% of people with a form of higher education in 1979 whereas this number increased to 49.5%. Finally, there are 109 missing values for this variable representing less than 3% of the total sample and thus a negligible amount.

VLV Comparison census37 Low (%) Average(%) High (%) Low (%) Average(%) High (%)

1979 65 (67) 12 (11.5) 23 (11.5) 52 23 8.2

1994 47 (46.5) 16 (15) 37 (38.5) ~22 ~49 ~23

2011 20 (18.8) 31 (31.7) 49 (49.5) 30.3 45.1 12.47

Table 18: Distribution of recoded educational levels 1979-2011: Comparison census Source: COMP, Bundesamt für Statistik, 1985, 1993, 2012a.

Note: Weighted values in brackets (adjusted for stratified sampling design), otherwise unweighted values

These somewhat surprising findings based on the COMP dataset have been compared with census data concerning all of Switzerland. This comparison is depicted in table 18.

It shows a certain amount of discordance between the COMP estimates and federal census data. One of the main reason for this contrast lies in the fact that the COMP dataset is not representative of Switzerland but it only covers the cantons Valais and Geneva – a mountainous rural canton on one hand, and a highly urbanized canton on the other. As a quick reminder, the reason for this is that for the COMP and VLV datasets and their respective surveys the key idea was to focus on selected key cantons rather than aim for a maximum of representativity for Switzerland. The rationale behind this approach is to be able to control for a socio-economic and cultural setting of a canton and being able to determine the effects thereof on people of different age groups. This survey design thus allows a sufficient number of cases in each age group per canton. Yet, it comes at the cost of compromising representativity for the rest of Switzerland.

Valais Geneva

Low (%) Average(%) High (%) Low (%) Average(%) High (%)

1979 79 7.8 13.1 55 15 30

1994 58 14 29 36 16 48

2011 23 34 43 15 30 56

Table 19: Distribution of recoded educational levels 1979-2011 in Geneva an Valais Source: COMP Note: Weighted values in brackets (adjusted for stratified sampling design), otherwise unweighted values

In order to give an insight into the mentioned cantonal differences, table 19 shows the distribution of educational levels in Geneva and Valais. It can be seen that those two cantons are, as pointed out above, quite contrasting. In 1979, for example, Geneva already features 30% of people with a high educational level compared to 13.1 % in Valais. In 2011 the two cantons have moved closer together, yet important differences remain. For instance, Geneva still features considerably higher levels of people with a higher education at 56% compared to Valais with 43. On the other end of the spectrum,

37 Sources: (Bundesamt für Statistik, 1980, p. 266; 1990, p. 36; 2014)

there are only 15% of people with little or no formal education in Geneva, compared to 23% in Valais.

Finally, these differences between COMP data (or VLV data, respectively) might raise the issue of data quality and possible biases in the surveys. Here it can be said that post-survey analyses that have since been performed do not support this conclusion.

VLV in particular has been shown to capture all segments of the elderly population adequately, especially the most vulnerable who traditionally are susceptible to exclusion (Guichard, Nicolet, Monnot, Joye, & Oris, 2015).

Civil status

This variable is constructed with four levels: Married, single, widow, divorced/separated.

For the analysis using binomial logit models, being married will be the baseline category to which all other levels are compared to. Table 20 shows the distribution of this variable in the COMP dataset.

Married(%) Single (%) Separated / divorced (%)

Widow (%)

1979 56 (52.2) 9 (9.7) 5 (4.9) 30 (33.2)

1994 59 (59.5) 7 (7.8) 6 (6.6) 28 (26.1)

2011 58 (59.5) 6 (6.1) 11 (13.0) 24 (21.4)

Table 20: Distribution of civil status per year 1979-2011 Source: COMP dataset Note: Unweighted data showing distribution in sample

The relative distribution of civil status among the samples in 1979, 1994 and 2011 shows that being married is by far the most frequent category throughout all of the three waves.

Being single is a rather atypical civil status representing not even 10% of the sample throughout the observed period. Divorce has roughly doubled in frequency as it increases from 5% in 1979 to 11% in 2011. Finally, widowhood has decreased from 30%

to 24%. For this variable there are no missing values in the COMP dataset.

Complete cases for the historical comparison

When it comes to complete profiles – meaning individuals for which all the aforementioned variables are available in their entirety – the analysis reveals a potentially problematic situation. Overall, there are only 3292 individuals with such complete profiles in the COMP dataset, representing 81% of the original sample.

A closer look reveals that the main problem results from the variable monthly household

income. This problem has already been described in section 3.4 where it has been shown that there are generally high levels of non-response for this variable over all three waves.

Overall, there are 584 item non-responses for this variable, signifying a share of roughly 14%.The following sections and parts of the analysis that focus on functional and mental health inequalities are less problematic. Together these health-based analyses can be carried out on 3834 individuals with complete profiles representing 94.4% of the total sample. With just over 5% of the whole sample which will be deleted in the statistical analysis, this part of the analysis should not create any significant problems in terms of selection bias.

Given this situation, there are three main approaches that could be taken to resolve the presented problem of missing data (as described in King, Honaker, Joseph, & Scheve, 2001). The first consists of performing the analysis only on those individuals who have complete profiles. This would guarantee that all analyses are performed on the same sample of individuals, enabling a coherent interpretation and comparison of results. The downside of this first approach would be that giving up, so to speak, almost a fifth of the sample would also mean increasing the risk for serious selection biases and generally biased results. The second approach consists of including the missing categories into the analysis. More specifically, this would mean to create an additional answering category of “no reply” which would then be included in the statistical models. The advantage of this approach is that the missing values are not dropped and thus information supposedly is not lost. However, the added value that this approach offers is questionable at best. It is often difficult to interpret and really gain significant insights into the results for modeled missing data. The third solution is given by a statistical technique that is called multiple imputation. This technique is described in detail in section 3.8. Briefly summarized, it is a technique of dealing with missing values that is based on estimating multiple datasets based on the structure of the original dataset – this means taking into account the statistical uncertainty on missing values -, then running statistical models on all of these imputed datasets individually and ultimately, pooling the results together and taking a mean value along with confidence intervals from all of them. Basically, this technique treats the coefficients resulting for each of the imputed datasets as being statistically distributed themselves. This approach is generally considered to produce statistically sound results.

In light of these approaches, the adopted solutions were as follows:

• First, the analysis of incomes and poverty (Gini-coefficients, exploratory statistical models), will employ multiple imputation in order to deal with the high number of missing values. This will enable having statistical estimates that are the best approximation to having a complete dataset without any missing values.

• Second, the analysis on functional and mental health is carried out on those individuals that have complete profiles, hence 94.4% of the sample. With only a little more than 5% of the sample missing, I consider the results of these analyses being close to representing the totality of the sample -the bias introduced by missing values can be considered marginal.

Obviously, the question that has to be asked here is why I did not opt for multiple imputation for the totality of the sample in this chapter and the corresponding analyses.

The reason for this is based on the epistemological background of multiple imputation.

While I consider it to be absolutely valid to estimate missing responses for poverty, for example, based on a person's socio-demographic profile, I believe it is much more problematic for areas such as mental health, where the relationship with socio-demographic variables is less evident. For this reason, I renounce to use this technique for the second part of the analysis regarding health. As has already been pointed out, given the small number of missing values (roughly 5%), the risk for sample selection biases should be negligible. I therefore consider this approach, multiple imputation for incomes and poverty and working with complete profiles for health inequalities, as being a solution that enables solid statistical analysis and at the same time guarantees high comparability of the results.

3.6 Covariates and complete profile analysis economic resources