• Aucun résultat trouvé

“Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with

3.6 Data collection

3.8.2 Statistical analysis

The process of data reduction refers to a set of statistical techniques aimed at simplifying the dataset “so that the mass of details does not prevent us from seeing the forest for the trees”

(Dörnyei & Csizér, 2012). Technically, these procedures resemble those of data cleaning to a great extent, nevertheless, they also constitute the first steps that the “actual analysis of questionnaire data always starts with” (ibid.). In practice, data reduction consists of the necessary techniques to transform individual items into clusters referred to as multi-item scales.

On the one hand, these techniques include calculating the mean values of the items in the cluster, for instance through the compute scales function in SPSS. On the other hand, reliability concerns require the researcher to carefully consider which items to combine to form the scale.

As mentioned earlier (Section 3.4), there are two possible approaches to this question. The study followed a concept-based method, the theoretical aspects of which were explored previously. Here, I outline the statistical procedures involved in the creation of the item clusters used in the study.

A concept-based approach entails scale creation based on standardized item sets developed by previous research. Therefore, selecting the items to feature in a scale means discriminating between those that proved relevant to the investigation of the concept in question in the current study as opposed to the ones that, for some reason or other, did not perform well. As I explained in the section on the attitudinal variables included in the study, the performance of each item was measured by calculating its internal consistency coefficient as part of the scale. In statistical terms, this meant examining the Cronbach’s Alpha value for each question in the cluster, so as to decide which ones were to be excluded in order to enhance the internal reliability of the scale. The procedure is described in detail by Dörnyei (2007), who advises an Alpha of α = .60 as the minimum, a principle which was also observed in the study.

Nevertheless, in some cases, scales of this type might not function as expected or might suggest the presence of several underlying variables. In these instances, principal component analysis was used to identify the internal structure of the scale. The workings and practical implications of this method can be observed in the description of the scales on target language groups and direct contact (Section 3.4). The former could clearly be divided in two subscales based on the

origin (American or English) of the speakers, while the latter formed four subsets embodying different types of contact. In summary, principal component analysis helped distinguish latent elements measured by the items in the original scale, therefore resulting in clusters of higher validity.

Once the scales are established and the mean values of the item sets computed, all further analysis is based on these new variables instead of the original item values. First, descriptive statistics are calculated, which can describe the sample in terms of its general characteristics.

These characteristics include the average of all the answers (means) and the distance between the lowest and highest value (range), the number of respondents that chose a particular option (frequency), and the most frequently chosen option (mode), as well as the extent of consensus among participants in a given question (standard deviation). It is the nature of descriptive statistics that limits their applicability, and they provide little more than a general sense of the data and a summary of participants’ answers.

In order to draw any conclusions that go beyond the sample, or, indeed, any conclusions at all, inferential statistics have to be used. As opposed to descriptive statistics, this type of data treatment examines “whether the results that we observed in our sample (for example, differences or correlations) are powerful enough to generalize to the whole population”

(Dörnyei, 2007). This distinction, expressed in the form of significance values, “lies at the heart of statistics and failure to understand statistics is often caused by the fact that insufficient attention has been paid to this difference” (ibid.). Statistical significance, i.e. the extent of the probability that a particular result is caused by happenstance, is therefore key to the reliability of the research.

Consequently, as Dörnyei (2007) points out, generalizable conclusions cannot be drawn by simply contrasting the descriptive statistics available for two different groups. It needs to be proved that this difference is not restricted to the dataset but showcases a larger trend, one that might possibly be extended to the whole population. T-tests are statistical tools that can measure the significance of such differences, producing results that can be generalized outside of the sample. They are used when two groups or values are compared, while in the case of three or more figures, one-way analysis of variance (ANOVA) scores are computed. In the study, for instance, the former was applied to test the significance of the differences between male and female students, whereas ANOVA was used to compare the four faculties. Listwise exclusion permitted the analysis of participants to whom all variables were applicable.

In order to investigate the relationships between variables, the study made use of two statistical procedures, correlation analysis and regression analysis. Correlation coefficients show to what extent participants’ scores for two variables ‘move together’. While they can thus reveal interrelationships between two concepts by assuming that similar patterns indicate a connection, there are two important issues to keep in mind. First of all, similarly to the tests discussed above, the strength of the convergence is secondary in importance to the significance of the relationship. For, in reality, two variables are only related if their concurrence in the sample is not a result of random variance but a sign of true connection. Secondly, correlation analysis is often mistaken for a test of causation and its results tend to be interpreted as explanation. Nevertheless, significant coefficients only indicate a relationship between two variables, without taking into account possible causation or even the existence of a third factor influencing the scores.

Regression analysis (cf. Larson-Hall, 2010), a simplified form of structural equation modeling (SEM), however, can test the contributions of several variables at the same time. Moreover, when supported by a well-founded theoretical framework, it can also make inferences as to the direction of a connection. Regression tests the contribution of any number of independent variables with regard to a single dependent variable. This method computes models, consisting of several independent variables which, when combined, can explain some of the variance in the dependent variable. Thus, analyzed in the theoretical framework of the study and in the light of past research findings outlined in Chapter 2, the regression analysis helped pinpoint factors in the dataset that can be argued to have an influence, be it direct or indirect, on their motivation to learn English.

So as to examine the interrelationships of different independent variables (such as language attitudes) and their predictive power toward a single dependent variable (motivated learning behavior), the present study combines the individual models computed through regression analysis to produce a similar output to SEM: an interconnected ‘web’ or network of factors, which can be used to illustrate the structural makeup of the results. Based on the conceptual framework introduced in Chapter 2, the first step of this analysis focused on the key dependent variable of the research, motivated learning behavior. Once the set of variables with the greatest combined influence was identified, subsequent tests were carried out to examine the predictors of those variables. Repeated for each predictor, this analysis resulted in a layered set of regression models.

It is to be noted that, while regression analysis computes and compares several models or sets of independent variables, some of these models can be more powerful in their explanatory power, since they can account for a larger proportion of the variance in the dependent variable than the others. In the research, I used a stepwise method, which compares models beginning with the most simple one, consisting of the fewest factors, and adding independent variables one at a time, in the order determined by their correlational link to the dependent variable.

As with other tests, significance is key, as is the amount of variation explained by the models.

These parameters allowed me to identify the combination of independent variables that are the most likely to explain differences in the given dependent variable. In section 4.6, I focus on the individual models that showed the greatest fit with the dataset, in terms of the separate and combined predictive power of their components. I then present these individual models as part of what constitutes the schematic representation of the final, composite model of participants’

motivational profile in section 4.7. Accordingly, only significant connections and predictors belonging to the models with the most explanatory powers were considered in the complete structure.