• Aucun résultat trouvé

4.6 Comparison of model fit

4.7.2 CHC-based models

Next, we tested a CHC-based model with cross-loadings fixed to zero (Model 5) and another CHC-based model in which cross-loadings were freely estimated (small variance priors; Model 6). The comparison of Model 1 and Model 5 supports the hypothesis that the CHC framework provides a better description of the WISC–IV subtest scores than the classical four-factor model. As shown in Table 3, only one cross-loading was significant:

Arithmetic loaded on both the Gf factor and the Gsm factor. However, this model showed poor fit (PPP = 0.028, DIC = 9,632.226).

148 Table 3. BSEM analysis with no cross-loadings – CHC Model (model 5)

First order loadings estimates Note. Loadings in bold were freely estimated. Other loadings were fixed to 0.

149 Free estimation of the cross-loadings (Model 6) greatly improved the fit of the model (PPP = 0.529, DIC = 9,601.477; see Table 2). As shown in Table 4, all hypothesized major loadings had substantive backing, except the loading from Gf to Arithmetic, which was very low and did not exclude zero (0.115, 95% CI −0.319 to −0.519). In line with our previous classical ML-CFA analyses, this finding suggests that the Arithmetic score is not a measure of Gf in the French WISC–IV (Lecerf et al., 2010). It should be emphasized that the typical CFA approach to model specification (as represented in Model 5) suggests adopting a more complicated model (with the Arithmetic score loading on both Gf and Gsm), whereas the BSEM approach (Model 6) suggests a simpler model, in which Arithmetic loads only on Gsm. No other cross-loading were considered to be substantive, because the 95% CI always included zero. Although these nonzero cross-loadings do not possess much clinical significance (because their magnitude is small), they are nevertheless statistically important for a correct and nonbiased estimation of the model parameters. The small magnitudes of the cross-loadings suggest that besides the major loadings, no secondary interpretation of the subtest scores is backed up by the data.

150 Table 4. BSEM analysis with small-variance priors for cross-loadings – CHC model (model 6)

Letter Number Sequencing 0.023 0.032 0.001 0.749 -0.111

-0.173 0.212 -0.171 0.244 -0.155 0.173 0.528 1.011 -0.278 0.045

Note. Loadings in bold were freely estimated. Other loadings were estimated with small (0.04) variance priors.

151 The saturations of the first-order factors in Model 6 were reduced in comparison to those found in Model 5, suggesting that inappropriate fixed zero cross-loadings increased first-order factor correlations and thus loadings on the g factor. Indeed, the loading of Gf on g was estimated at .956 in Model 5 (see Table 3) and .880 in Model 6 (see Table 4). Although it is not reported here, we found with maximum-likelihood estimation that the loading of Gf on g was unitary. This finding demonstrates again that inappropriate fixed zero cross-loadings likely increased first-order factor correlations and thus loadings on the g factor; more precisely, the second order loadings are often overestimated and are sometimes even found to be unitary with classical ML-CFA analyses.

Finally, we estimated direct hierarchical CHC models. Once again, the model with cross-loadings fixed to zero (Model 7) was rejected (PPP = 0.046). However, the direct hierarchical five-factor model with freely estimated cross-loadings (Model 8) showed the best overall fit to the data (PPP = 0.620, DIC = 9,493.646). As shown in Table 5, the direct loadings of the subtest scores on g were all substantively backed up. However, it is important to note that only the Gsm and Gs factors were really supported. Indeed, when the variance of the general factor was extracted as a breadth factor, g explained more variance of the subtest scores than did Gc, Gv, and Gf. Most important, no 95% credibility interval of the subtest loadings on Gc, Gv, and Gf excluded zero. These results suggest that there is very weak support for calculation of separate Gc, Gv, and Gf factor scores. In contrast, Gs and Gsm showed a distinct contribution in addition to the part of variance explained by the general factor.

152 Note. Loadings in bold were freely estimated. Other loadings were estimated with small (0.01) variance priors.

153 4.8 Discussion

Analyses reported in the technical manual of the WISC–IV support a four-factor structure. However, this factorial structure is not explicitly aligned with the largely consensual Cattell–Horn–Carroll (CHC) theory of cognitive ability. Nevertheless, it has been shown that the WISC–IV could be better described with a CHC-based model, including five factors (Flanagan & Kaufman, 2009; Keith et al., 2006). Some questions also remain about the factorial structure of the WISC–IV and about the constructs measured by each subtest score.

The majority of the studies addressing these questions used confirmatory factor analysis with a maximum-likelihood estimation procedure. However, this approach has several methodological and theoretical limits.

To address some of these limitations, we examined the French WISC–IV structure using Bayesian structural equation modeling (BSEM). The present study deals with three main questions: (a) Does a five-factor CHC-based model better fit the WISC–IV data than does the current four-factor structure? (b) What constructs are measured by each subtest score, and should cross-loadings be added to the model? Remember that with BSEM, the estimation procedure allows one to test for more complex model structures because all parameters are free and are estimated simultaneously (major loadings and cross-loadings).

This may result in a final model that is less sample specific and hence more suited to other samples. (c) Does a direct hierarchical model better fit the data than does a high-order model, supporting a breadth conceptualization of g rather than a superordinate conceptualization of g?

First, concerning the debate about the structure of the WISC–IV, the results of the present investigation clearly indicate that the French WISC–IV subtest scores could be better described with a second-order general intelligence factor and five first-order factors defined according to the CHC framework: fluid reasoning (Gf), comprehension-knowledge (Gc), visual processing (Gv), processing speed (Gs), and short-term memory (Gsm). This finding is congruent with previous ML-CFA analyses and suggests that the CHC model would allow clinicians to make more adequate interpretations than the standard four-factor structure (Keith et al., 2006; Lecerf, Rossier, et al., 2010b). Therefore, to gain clinical validity of tests scores interpretation, we recommend interpreting the results of the French WISC–IV subtest scores according to the CHC classification, with the norms that we have developed for the French

154 WISC–IV (Lecerf et al., 2012). The superiority of the direct hierarchical and the higher order CHC-based models over the current four-factor alternatives also reinforces the idea that the PRI score should be split into two subcomponents (Gf and Gv).

Second, following this first debate about the factorial structure of the French WISC–

IV, results from this investigation provide some very important information about the interpretation and constructs measured by the score of each subtest. In brief, the main results were as follows: All but one of the major loadings depicted on Figure 1b were supported by the data, as the Arithmetic subtest score loaded on the Gsm factor but not on the Gf factor.

Remember that Keith et al. (2006) showed that the subtest measured Gf (and also Gsm and Gc). On the other hand, Flanagan and Kaufman (2009) assumed that Arithmetic required Gf and Gsm, particularly for younger children. Concerning the French version of the WISC–IV, Grégoire (2009) suggested that the Arithmetic score measured both Gsm and Gc. Our previous ML-CFA indicated that the French Arithmetic subtest score measured quantitative knowledge (Gq) and short-term memory (Gsm) and, most important, that it did not measure fluid reasoning (Gf). The results found with BSEM were relatively consistent with those results: The Arithmetic score loaded only on Gsm (and did not load on Gf). Therefore, the Arithmetic score should be considered as a measure of Gsm, along with Digit Span and Letter–Number Sequencing. Nevertheless, the magnitude of the loading of Arithmetic on Gsm was smaller than the loadings of Digit Span and Letter–Number Sequencing on this factor. It should be noted that we did not test the hypothesis that the Arithmetic score measures quantitative knowledge (Gq) in the present study, because Arithmetic was the only subtest score that appears to measure this ability in the French WISC–IV.

In addition, the results of both the hierarchical and the higher order CHC models demonstrated that the Similarities and Word Reasoning scores did not provide fluid reasoning (Gf) measures (Grégoire, 2009; Keith et al., 2006; Lecerf et al., 2010). Consistent with previous studies, the present results showed that the Block Design score loaded only on visual processing (Gv) and was not a measure of Gf (Lecerf et al., 2010). BSEM also indicated, unlike previous ML-CFA analysis, that the Block Design score was not a measure of processing speed (Gs). As concerns the Picture Completion score, our previous ML-CFA studies suggested that it primarily measures Gc and to a lesser degree Gv. In contrast, results with BSEM suggested a simpler interpretation of the Picture Completion score: This subtest measured only Gv and not Gc. Next, results indicated that the Matrix Reasoning subtest score

155 mainly measured fluid reasoning (Gf), and had no substantive loading on Gv. This finding was not congruent with Carroll (1993), who suggested that fluid reasoning tests required visual processing. Consistent with our previous ML-CFA and in contrast to Keith et al.

(2006), the Symbol Search score appeared to measure only processing speed (Gs); indeed, the cross-loading on Gv was not substantive. Finally, BSEM demonstrated that the Coding score loaded only on Gs and was not an indicator of Gsm. Altogether, besides the major loadings, no secondary interpretation of the subtest scores has been found to be backed up by the data.

Thus, BSEM suggested simple and parsimonious interpretation of the subtest scores.

Estimating all cross-loadings simultaneously did not lead to a more complicated loading pattern.

BSEM also provided some important additional information concerning the correlation between the first-order factors and the loadings on the second-order factor. Indeed, in contrast to the results obtained with ML-CFA, results with BSEM indicated that the correlation between g and Gf was not 1 (or close to 1; .956 in the present investigation) but around .88. With classical ML-CFA estimation, some authors have suggested, statistical power could explain why it was not always possible to distinguish the general factor from fluid reasoning (Floyd et al., 2005). We suggest that inappropriate zero cross-loadings could also account for the unitary loading between g and Gf, because the correlations between first-order factors can be overestimated when using standard CFA estimators. Thus, in line with Matzke et al. and contrary to Gustafsson (1984), we argue that the equivalence of g and Gf in higher order models is better accounted for by statistical artifacts than the mere equivalence of the two constructs.

Third, the present study favored a direct hierarchical model, which was more adequate than a higher order model. In other words, the structure of the WISC–IV was best represented by five first-order CHC factors, plus a general intelligence factor. Consistent with previous studies, it was found that although subtest scores were aligned with their theoretically assumed factors (e.g., VCI or Gc), the g factor accounted for the largest portions of variance.

Indeed, the variance explained by first-order factors was weak when the variance accounted for by the g factor was partialed out. For instance, 10 subtest scores exhibited a higher loading on g than on their respective first-order factor. Furthermore, results indicated that g explained the largest part of the variance of the Gc, Gf, and Gv subtest scores, in contrast to the Gsm and Gs factors. These findings are consistent with those reported by Watkins (2006) and

156 Watkins et al. (2006) for the U.S. WISC–IV, and they indicate more generally that published cognitive batteries may be overfactored (Canivez, 2008). In addition, this finding was consistent with Gignac’s results (2008) and showed that the conceptualization of the general factor as a first-order breadth factor is preferable to that as a higher order superordinate factor:

The relationship between the general factor and each subtest score does not seem to be fully mediated by the broad abilities. Nevertheless, we want to emphasize that this finding does not necessarily suggest that there is a single underlying mechanism for g. To the contrary, we consider that g is a complex construct defined by the interaction between several mechanisms (Van Der Maas et al., 2006). From a practical point of view, the present results indicated that interpretation of WISC–IV subtest scores should focus mainly on the Full Scale IQ (FSIQ) and that standard and/or CHC index scores do not necessarily provide additional and separate information. Remember that it has been shown that the general intelligence factor accounted for an important part of the FSIQ variance and that FSIQ was the best predictor of academic achievement regardless of index score variability (Watkins, Glutting, & Lei, 2007).

In conclusion, in the present paper, we used BSEM to analyze the structure of the French WISC–IV and to determine the nature of the constructs measured by each subtest scores. The results indicated that the BSEM approach to model specification and estimation performed better than the more classical and restrictive ML-CFA approach. The framework used in this study might be useful for other versions of the Wechsler scales and other intelligence tests. Although Bayesian methodology is used increasingly, to our knowledge it has not been frequently used with intelligence research and factor analysis of cognitive batteries, such as the WISC–IV. We argue that the Bayesian methodology is very efficient for incorporating both knowledge and uncertainty when analyzing psychological instruments.

157

Chapitre 5

Testing for multigroup invariance of the WISC-IV structure across France and Switzerland: Standard and CHC models

Objectifs de cette étude: Le but de cette quatrième étude est, d’une part d’évaluer l’invariance de la structure factorielle du WISC-IV dans les échantillons français et suisses romands. Puis, d’autre part de déterminer si les construits mis en jeu dans les épreuves sont similaires dans les 2 groupes d’enfants. Si certaines disparités sont constatées entre ces deux cultures au niveau de la structure, il est important de définir si ces différences sont statistiquement significatives. Ainsi, afin de vérifier si les psychologues romands peuvent continuer à interpréter les résultats du WISC-IV chez des enfants romands dans le même sens que chez des enfants français, une analyse multi-groupes sera conduite dans cette étude. Les deux échantillons qui seront comparés sont : l’échantillon français de standardisation composé de 1103 enfants et un échantillon suisse romand composé de 249 enfants issus de plusieurs écoles du canton de Genève.

Isabelle Reverte, Philippe Golay, Nicolas Favez, Jérôme Rossier, & Thierry Lecerf, (en révision), Learning and Individual Differences.

5.1 Abstract

Measurement invariance of the WISC-IV factorial structure between French and Swiss samples was investigated by means of multi-group confirmatory factor analysis (MGCFA).

The first sample was the French WISC-IV standardization sample described in the French WISC-IV technical manual (ages 6:0 through 16:11 years), which included 1103 children (Wechsler, 2005). The French-speaking Swiss sample included 249 children ranging in age from 8 to 12 years old (124 males, mean age = 10.16, SD = 1.12 and 125 females, mean age = 10.26, SD = 1.17). Multigroup higher-order models were estimated to assess measurement invariance. In a first step, multigroup models were used to analyze the current four-factor structure. In a second step, multigroup higher-order models were conducted on a CHC-based model with five factors. For both the four-factor and the CHC-based models, results supported partial measurement invariance. Indeed, while configural and weak metric invariance criteria were met, intercept invariance was not demonstrated.

158 Keywords: WISC-IV, CHC theory, measurement invariance, intelligence, multigroup SEM, cultural invariance

5.2 Introduction

The Wechsler intelligence scales, and more particularly the Wechsler Intelligence Scale for Children (WISC-IV), are the most used tests across the world (Bowden, Saklofske,

& Weiss, 2011; Chen, Keith, Weiss, Zhu, & Li, 2010; Lecerf, Rossier, et al., 2010b).

Wechsler intelligence scales have been adapted in approximately 20 nations and are employed for the assessment of intelligence (12 adaptations were done across the world for the WISC-III). To the best of our knowledge, exploratory and/or confirmatory factor analysis (EFA and CFA) supported a four-factor structure for all countries.

The French WISC-IV was adapted from the U.S. version, and showed adequate psychometrics properties in the French standardization sample. Because no specific adaptation of the WISC-IV was done in Switzerland, Swiss practitioners are forced to use the version adapted in other countries (Germany, Italy, and France). Thus, they are forced to assume that the WISC-IV measures the same psychological attributes in all groups. Although this assumption is probably justified, it still needs to be empirically tested.

Switzerland is a country with three linguistic regions (one German part, one Italian part, and one French part). Regarding the French part of Switzerland, there is no French-speaking Swiss specific standardization data available for the WISC-IV. The factorial structure of the WISC-IV and the normative tables developed in France are used for the assessment of French speaking Swiss children. Although France and the French part of Switzerland are very close geographically and supposedly culturally, there is no guarantee that cultural and linguistic backgrounds are strictly equivalent. As stated by Ortiz et al.

(2012), culture and language influence intelligence test performance, and for these authors, all tests have cultural and linguistic components. Therefore, testing for measurement invariance across both groups is an important issue for the clinical use of the WISC-IV in the French part of Switzerland. Measurement invariance must be examined prior to any interpretations in order to demonstrate that subtests and index scores have the same meaning for all samples (Chen et al., 2010). Without evidence of measurement invariance, similar interpretation of the WISC-IV subtests and index scores cannot be granted for French-speaking Swiss children.

159 Currently, the interpretation of the WISC-IV is based on a four-factor structure (Verbal Comprehension, Perceptual Reasoning, Working Memory, & Processing Speed).

Contemporary studies have shown that the four-factor solution provided satisfactory goodness of fit for several normative samples, like U.S sample (Weiss, Saklofske, Prifitera, &

Holdnack, 2006), Asian samples (Chen et al., 2010), French normative sample (Lecerf, Rossier, et al., 2010b) and French speaking Swiss sample (Reverte, Golay, Favez, Rossier, &

Lecerf, 2014). The first goal of this study was to investigate the measurement invariance of the current four-factor structure of the WISC-IV between a French and a French-speaking Swiss sample. Implicit is the assumption that WISC-IV scores have the same meaning for French and Swiss-French children. Thus, equivalence is assumed to hold for the factorial structure, factors patterns, and the underlying factors. As far as we know, measurement invariance of the WISC-IV scores across French and Swiss-French children has never been reported. Thus, it is unknown whether the WISC-IV shows measurement and structural invariance across these two groups.

Even if the last revision of the WISC represents a significant revision, and is more psychometrically and theoretically grounded (Grégoire, 2009), this battery still presents some important limitations, especially concerning the relation between test interpretation and contemporary theory of cognitive abilities. Indeed, the factorial structure of the WISC-IV is not completely in line with the Cattell-Horn-Carroll (CHC) model of intelligence measurement, which is currently the consensual psychometric-based model of human cognitive abilities (Alfonso et al., 2005). Several studies demonstrated that CHC-based models were equally or more adequate than did the four-factor scoring structure for the U.S sample (Keith, Fine, Taub, Reynolds, & Kranzler, 2006; Weiss et al., 2013b), for the French standardization sample (Lecerf, Rossier, et al., 2010b) and for a French speaking Swiss sample (Reverte et al., 2014). Thus, it has been suggested that the WISC-IV subtests scores measure five CHC factors: fluid reasoning (Gf), comprehension-knowledge (Gc), visual processing (Gv), short-term memory (Gsm), and processing speed (Gs). These findings suggest that the interpretation of the WISC-IV subtests scores could be improved by applying

Even if the last revision of the WISC represents a significant revision, and is more psychometrically and theoretically grounded (Grégoire, 2009), this battery still presents some important limitations, especially concerning the relation between test interpretation and contemporary theory of cognitive abilities. Indeed, the factorial structure of the WISC-IV is not completely in line with the Cattell-Horn-Carroll (CHC) model of intelligence measurement, which is currently the consensual psychometric-based model of human cognitive abilities (Alfonso et al., 2005). Several studies demonstrated that CHC-based models were equally or more adequate than did the four-factor scoring structure for the U.S sample (Keith, Fine, Taub, Reynolds, & Kranzler, 2006; Weiss et al., 2013b), for the French standardization sample (Lecerf, Rossier, et al., 2010b) and for a French speaking Swiss sample (Reverte et al., 2014). Thus, it has been suggested that the WISC-IV subtests scores measure five CHC factors: fluid reasoning (Gf), comprehension-knowledge (Gc), visual processing (Gv), short-term memory (Gsm), and processing speed (Gs). These findings suggest that the interpretation of the WISC-IV subtests scores could be improved by applying