• Aucun résultat trouvé

Part II Research Design

8 Analytical Methods

8.4 Structural Equation Models

Structural equation modeling (SEM) is a statistical technique that is widespread in social sciences because of its ability to combine latent variables with qualitative causal assumptions. In terms of other methods it can best be described as a combi-nation of confirmatory factor analysis4, representing the measurement part, and path analysis, making for the structural part of SEM (Acock 2013). Structural equation modeling represents a convergence of two traditions, bringing together the psycho-metric focus on unobserved (latent) constructs and the econopsycho-metric emphasis on causal prediction (Chin 1998).

The first use of structural equations is likely to go back to Sewall Wright, who in the 1920s addressed the problem of simultaneity in the measure of demand (for a discussion, see Goldberger 1972). Several decades passed before SEM started to receive widespread attention, and only at the end of the 1980s as research questions in the social and behavioral sciences grew increasingly complex, the first compre-hensive books were published (Hu and Bentler 1995). The advances in statistical software and the explicit interest of major journals, including the launching of a specialized journal in 1994 (‘Structural Equation Modeling: A Multidisciplinary Journal’) have certainly provided a fertile ground for the increased proliferation of SEM techniques.

Before we go into more technical details, a brief comment on what we under-stand by causality in the context of SEM seems appropriate because we will recur-rently use the term ‘causal’. In social sciences it is very rare to have data that have been obtained as a result of randomized trials where participants were randomly assigned to ‘treatment’ groups or where it was possible to control the exposure to the independent variables (Acock 2013, p. 59). Consequently, even if a model fits the data very well, the term ‘causality’ has to be used prudently because in most cases, the statistical results would be equally good if the relationship between two variables was reversed. Moreover, as discussed elsewhere, our primary objective in this volume is not to explain causal relationships between variables but to shed light on the relation – in the sense of ‘common configurations’ – of three manifest out-comes of economic vulnerability and the risk constellations that are typical for these configurations.

Following the logic outlined by Rick H. Hoyle (1994) and Alan C. Acock (2013), we will consider separately the two components that make up SEM, the measurement component based on latent variables and the structural component stemming from causal path models.

As discussed in the theoretical part, latent variables are variables that are unob-servable and are not directly measurable. They are used for analyzing concepts that are somewhat vague and can therefore only be circumscribed in an approximate manner. In SEM it is common to distinguish between two types of latent variables

4 According to Hoyle (1994), factor analysis can be considered a special case of the general struc-tural equation model.

8 Analytical Methods

that are different with regard to their hypothesized relationship with the observed variables, which are called ‘indicators’. Most latent variables used in social and psychological research are presumed to cause the observable indicators, that is, a latent construct makes participants respond in a certain way to the survey questions (Edwards and Bagozzi 2000). Because these observable variables reflect the latent variable, they are called reflective indicators (see Fig. 8.1). To take the example of measuring stress, a latent construct could be built upon a series of survey questions measuring the typical symptoms of stress, all of which tend to be highly correlated.

Sometimes it is more sensible to assume that the causal flow goes from the observed variables to the latent variable. Here, the former are called formative indi-cators (Fig. 8.2) because they cause the latent variable, which is referred to as a formative construct or a composite latent variable (Acock 2013). Typically, this is the case when each item provides a relatively independent piece of information about a broad construct, for example a deprivation index. Contrary to reflective indicators, which should all load very strongly on a single dimension, formative indicators vary in how strongly they are correlated with each other and, if a simple explanatory factor analysis was used, it could easily yield several factors (Acock 2013, p. 142).

Latent constructs such as the one depicted in Fig. 8.1 are usually based on ques-tionnaires items that tap into the same construct, providing the basis for applying the method known as confirmatory factor analysis (CFA, for a detailed discussion refer to Brown 2006) on which the following discussion is primarily based). Factor analysis aims at discovering the latent variable that influences all (several) of these items and accounts for the correlation among them. The thus obtained scale score (factor) represents the shared meaning of the entire set of items, yielding a more parsimonious understanding of the underlying concept. Foundational for under-standing factor analysis is the common factor model (Thurstone 1947), which posits

Fig. 8.1 Latent construct causing reflective indicators

Fig. 8.2 Formative indicators causing composite latent variable

90

that each variable in a set of observed variables is a linear function of one or more common factors and one unique factor, thus, the variance deduced from the correla-tion matrix between these variables can be divided into the common variance shared by all variables (accounted for by the latent factor) and the variance that is unique to each variable, which includes the variance that is specific to this variable and the random error variance (Brown 2006). From this model, two general approaches are derived, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jöreskog 1969). The difference between the two was succinctly stated by Bollen (2002, p. 615) in an article reviewing research on latent variables: ‘in exploratory factor analysis, the factors are extracted from the data without specifying the num-ber and pattern of loadings between the observed variables and the latent factor variables. In contrast, confirmatory factor analysis specifies the number, meaning, associations, and pattern of free parameters in the factor loading matrix before a researcher analyses the data’ (cited in Marsh et al. 2014, p. 87). The CFA model is thus evaluated according to how well it reproduces the correlation matrix of the observed variables: the higher the shared variance (communality), the better the model fits the data.

Causal path model – the structural component of SEM – denotes a set of vari-ables (latent or observed) that are in a particular relationship to each other. Two types of variables can be distinguished: exogenous variables are determined by fac-tors outside the model and endogenous variables are explained within the model as function of other exogenous and endogenous variables (Pearl, 2009). Translated into the language of a diagram, an exogenous variable is any variable that has no arrow pointing to it (they never regress on other variables) whereas an endogenous variables is any variable that receives an arrow (other variables regress on it), inde-pendent from the fact whether arrows point away from it or not (Acock 2013).

Consequently, in SEM it is possible that variables act as both, independent and dependent variables. This case of endogenous variables is usually referred to as mediator variables because they intervene between an exogenous and an endoge-nous variable and allow to measure the indirect effect of an independent variable on a dependent variable (Hoyle 2011).

The construction of confirmatory structural equation models, which is the type we are going to use, starts out by outlining the theoretical model in the sense of stat-ing the relationship among variables. This allows visualizstat-ing the causal relation-ship: The arrows between the components (variables or set of variables) show the direction of the causal influence where a change in component A leads (all other components remaining equal) to a change in B.  The specification of the causal model implies formulating statements about a set of parameters, which are con-stants that indicate the relationship between variables. The values of the free param-eters5, which may be factor loadings or regression coefficients, are then estimated:

5 The pathways between two variables can either be left free to vary because the objective is to measure the relationship between the two, or it can be fixed, usually based on the estimations found in previous studies.

8 Analytical Methods

the matrix of covariance between the variables in the empirical data set is compared with the matrix of covariance calculated in the model and, using a ‘fit criterion’ (e.g.

maximum likelihood estimation), the best fit is determined as the model that best represents the data (Hu and Bentler 1995). Besides the consideration for theoretical plausibility of the model, the researcher needs to ensure that the model contains at least as many pieces of information (observed variables and their covariances) as the parameters that need to be estimated6.

Among the main advantages of SEM Chin (1998, p. vii) mentions the possibility to isolate errors in measurement for observed variables. Being able to measure and remove them, rather than assuming that they are random, as is the case in traditional regression models, increases the predictive power of the model (Acock 2013).

Another advantage is the greater flexibility researchers have for the interplay between theory and data as compared to standing-alone techniques such as to factor analysis or multiple regression models: moving from variables to constructs, SEM allows to more closely align concepts with the corresponding statistical expression of the hypothesis (Hoyle 2011). In addition to afore mentioned capacity to model latent variables, this flexibility lies in the ability to model causal relationships between multiple independent and dependent variables. Given that in this volume we are dealing with three outcome variables and their interaction, this is a feature that makes SEM stand out as uniquely suitable to help in answering some of our research questions.