• Aucun résultat trouvé

A different operationalization of the peri-urban areas

Methodological axes

Chapter 2. Data and Methods

2.5 Methods of analysis

2.5.10 A different operationalization of the peri-urban areas

In Chapter 5 the last important aspect of this research is addressed: whether or not the peri-urban area, as provided by Swiss Statistics, is the most relevant identification of this geographical space when analysing the residential and life course trajectories. It is the aim to find out to which extent the residential,

37We used the ‘smooth.spline’ function in R (R Core Team 2013).

private and professional characteristics of the in-migrants change when a different operationalization of the peri-urban area is used. In Chapter 5 a description of this process is provided, however this section explains in more detail how these different operationalizations have come about.

After having evaluated multiple operationalizations of the peri-urban area, in both the national and international literature, it has been decided to adapt the existing Swiss municipal typology, in order to arrive at alternative operationalizations. With a multinomial logistic regression similarities between the different types in the Swiss municipal typology have been investigated.

The data and methods behind the results derived of these new operationalizations, which are presented from section 5.3 onwards, are not further discussed here, as they are executed in the same way as the results presented in Chapter 3 and Chapter 4.

For the creation of the alternative operationalizations it has been decided to incorporate the criteria of the municipal typology in a probabilistic way. Instead of excluding all municipalities below a certain threshold and including those if they are above it (see Figure 2-9), their chance of being included in the type increases as their criteria value level increases. The existence of one threshold (or multiple increasing thresholds), as in Figure 2-9, relates to a linear version of the probabilistic way. Other criteria may contain two thresholds, which determine the borders of inclusion into the type. In Figure 2-10 this deterministic point of view as well as the probabilistic interpretation is shown. With the probabilistic model the probability to be included into a certain type are increasing as the criteria value increases, up to a certain level, after which the probability to be included declines again.

Figure 2-9: Comparing a deterministic (black) and probabilistic (grey) approach:

linear

The advantage of the probabilistic approach over the deterministic approach is that this method allows investigating the relation between the different types in the typology. It is expected that examining the relations between the types of the municipal typology makes it is possible to regroup the territorial types into

more suitable groups for this research. The large detail in the municipal typology of Swiss Statistics is an important asset for geographical research, however interpreting it is not always straightforward. Gigon (2008) also indicates that despite the fact that the typology of Swiss Statistics provides more refinement of the territory, it is difficult to interpret and communicate due to its municipal level. It is however in the first place the goal to obtain a more inclusive peri-urban territorial type for this research and not to regroup all the territorial types in the typology.

Figure 2-10: Comparing a deterministic (black) and probabilistic (grey) approach: quadratic

Data selection

The data on all the criteria for the year 2000 have been obtained from Mr.

Schuler, one of the people involved in the creation of the municipal typology.

Data for a more recent year is unfortunately not available as in 2010 no complete census has taken place. From the 2000 dataset values for the percentage of commuters and the amount of federal taxes per person were missing, these have been added to the 2000 dataset from other sources (Statlas 2000b; Administration Fédérale des Contributions 2000b, 2000a)38. Appendix F provides information on all the criteria as listed for the 2000 municipal typology (Schuler et al. 2005:121-125).

Model explanation

To investigate the relations between the different types a multinomial logistic regression is used. Given a certain set of independent variables (nominal, binary, categorical etc.; see Appendix F) the model predicts the probabilities of

38For the cantons Zürich, Basel-Stadt and Thurgau the data was only available for the fiscal year 2000, whereas for the other cantons data was available for the fiscal year 1999-2000. As data for the former three cantons did not pertain to exactly the same period as the 2000 municipal typology, for 9 municipalities no data on the federal taxes was available. For these municipalities the value has been set to 0, to continue including them in the calculations.

the different categories in the dependent variable, which needs to be categorical (here the municipal type in 2000).

The way this model works is that it first looks at the effect every variable in the formula has on every type in the municipal typology, returning the coefficients. From this we can read the probability for every municipality to be in one of the types of the typology, returning the type with the highest probability.

This type of calculation takes essentially an average of all the criteria of all the municipalities for every type in the typology, and then passes by every municipality evaluating to which type the particular characteristics of this municipality fits the best. This type will then in turn obtain the highest probability. The model however looks further in indicating that the characteristics of this municipality are also not far from the average it has calculated for another type, giving this the second highest probability. It will continue this process indicating the probability this municipality falls in every type of the typology, often indicating zero probability for types that are very different. In a way the model looks at the criteria of a municipality indicating that if its level at a certain criteria was slightly higher or lower and for another criteria also higher or lower it would fit better in another type of the typology and this gives the possibility to see links between types in the typology.

Model selection

Often when multinomial logistic regressions are used it is the aim to arrive with the most optimal fit, with the highest well predicted proportion. In this instance it is however not the objective to obtain the model with the highest proportion of fit, but to arrive with a model where the proportion well predicted municipalities is not too high in order to leave enough wrongly predicted municipalities to conduct further calculations. Several models have been tested to arrive at a model fit that was optimal for the particular purpose here: finding types in the typology that might have connection. The models are estimated based on a random selection of 1896 municipalities of the total 2896 municipalities and are predicted on the remaining 1000 municipalities.

The model which included all the variables listed in appendix F-1 and the whole set of interactions (appendix F-2) resulted in a percentage of correctly predicted municipalities of around 78 per cent (with a final negative log likelihood of 614, a residual residence of 1230 and an AIC of 2490). A slightly simpler model, including all variables in a linear way and omitting all the interactions, obtained a very similar percentage of fit (76.2)39, however the AIC is lower, 2294. The negative log likelihood and residual residence are slightly higher; 913 and 1496. Both models provide around 20 per cent of wrongly predicted municipalities; however the latter model has been chosen for analysis as it is a simpler model and it produces the lowest AIC.

39The values presented are depended on the randomly selected sample, different samples produce different values. Therefore a duplication of the analysis might produce different, but very similar, results.

Clusters

The correlation between the different territorial types40, with the addition of clusters, as shown in Chapter 5, is based on the ‘average’ hierarchical cluster method in the corrplot function. The methods ‘complete’ and ‘mcquitty’ provided very similar results for the link between the peri-urban and the rich municipalities. Methods ‘single’, ‘median’ and ‘centroid’ did not provide very clear or logical clusters.

Instead of being based on the probabilities (‘corrplot’), clusters can also be created directly from the data matrix. There are several different options available in R to cluster data. For this research the K-Means clustering (k-means), Partitioning Around Medoids (PAM) and Fuzzy C-Means Clustering (cmeans) have been investigated41.

The first, k-means, clusters data from a data matrix based on k-means.

This method separates the points (in our case the municipal types) into specified groups (i.e. clusters; amount is specified by the user beforehand)

“such that the sum of squares from points to the assigned cluster centres is minimized” (R 2013c). This then results in a table which indicates how many and to which cluster the municipal types are assigned. PAM is a more robust version of the k-means method and it clusters the data around medoids. It minimises the sum of dissimilarities instead of the sum of squares. It looks for the indicated number of clusters or medoids in the dataset and clusters are created by assigning each observation to the nearest mediod (R 2013b). The last clustering method clusters the data based on the fuzzy c-means algorithm.

The dissimilarities are calculated with the sum of squares (R 2013a).

These different ways of clustering have provided quite similar results to the results from the multinomial logistic regression: a link between the peri-urban and the rich municipalities also seems to appear. However due to the fact that the types have been repartitioned over many different clusters (which need to be predefined in all three methods), the results are difficult to represent and are therefore not further included in this research.

40This correlation plot is executed with the ‘corrplot’ package in R (Wei 2013).

41For these calculations data from all 2896 municipalities is used and the same variables as in the aforementioned calculations. They are based on the packages ‘cluster’ and ‘e1071’ in R (Maechler et al. 2013; Meyer et al. 2014).

Chapter 3. The peri-urban area within the