• Aucun résultat trouvé

Evaluation of measurement properties

5.5 Discussion

6.4.1 Evaluation of measurement properties

The descriptive statistics for the GERT and its subscales are provided in Table 2 and descriptive statistics, Infits, and Outfits for each of the 83 items are provided in Appendix A.

Infit indices ranged from .84 to 1.18 and Outfit indices ranged from .50 to 1.42, with only two items having Outfit values below .80 and one item having an Outfit higher than 1.20. These results indicate that the Rasch model overall fits the data, confirming the results of Schlegel et al. (under review).

Table 2. Descriptive statistics (mean recognition rates and standard deviations) from the present study as compared with the results from Schlegel, Grandjean, & Scherer (under review).

120 Difficulty parameters of the items ranged from -3.34 (item sou68) to 1.46 (item sur74), with a mean of -1.03 and a standard deviation of .92. As can be seen in Figure 1, the majority of the items is located in the lower ability range as compared to the ability distribution of our sample, i.e., in the lower part of the figure. As in the previous study, this indicates that the GERT is comparatively easy for our sample and that the test provides most information about a participant’s ability in the lower range of the latent dimension θ. This is reflected by the fact that the TIC in Figure 1 is highest at a θ level of about 1. Accordingly, the measurement precision as indicated by the SDEM curve in Figure 1 is also highest at this level. As can be seen from Appendix B, the GERT measured most precisely at an ability level of -1.07 (SDEM=.24) and still showed comparable precision at an ability level of 1.00 (SDEM=.32).

Cronbach’s Alpha was .79. These results are very similar to the findings reported by Schlegel et al. (under review) and suggest that the measurement precision of the GERT is satisfactory in two different populations. As can be seen from Appendix A, the Raju test flagged one item, int39, as showing DIF (Raju value= 3.51, p<.001). This item had also displayed DIF in favour of men in Study 1, but had not been excluded to maintain an equal number of items per

emotion. Given that the overall Andersen LRT with gender as a splitting criterion was not significant (LR-value=94.229, df=82, p=.168), we concluded that gender DIF was not an issue of concern in the GERT.

A difference to the previous study by Schlegel et al. (under review), however, was revealed when comparing the mean recognition rates for the different emotions between the two studies (see Table 2). Anger (t(424) = 6.66, p = .000), disgust (t(424) = 9.65, p = .000), fear (t(424) = 7.42, p = .000), and surprise (t(424) = 8.24, p = .000) were significantly better recognized by French-speaking participants in the present study, and sadness (t(424) = 8.45, p = .000) and despair (t(424) = 10.20, p = .000) were significantly better recognized by

121 German-speaking participants in the previous study. Inspection of the confusion matrices of the two studies (see Appendix C and Supplementary Material of Schlegel et al., under review) provided a deeper insight into these differences. For example, participants in the German-speaking sample were more likely than participants of the present study to confuse anger and disgust with irritation, fear with anxiety and despair, and surprise with anxiety and interest. In contrast, participants in the present study were more likely to confuse sadness portrayals with despair and vice versa.

While such confusion patterns were to be expected based on previous literature (Bänziger & Scherer, 2010), the differential frequency of these patterns between the two language groups was surprising and will be discussed in detail in the discussion. As the sample in the study by Schlegel et al. (under review) had a higher mean age, we also considered age as a potential reason for the differences in recognition accuracy between the two studies. To make a more appropriate comparison, we calculated recognition accuracies separately for participants in the previous study who were under 35 years of age (which was the maximum age in the present study). However, results for this subgroup were not much different from the results for the whole sample and the reported differences to the present study remained significant.

122 Figure 1. Wright Map displaying the distribution of the sample’s ability estimates, the Test Information Curve (TIC), and the Standard Error of Measurement (SDEM) on the left side and the item difficulties on the right side of the latent dimension θ.

Note. The labels on the right side represent the item numbers (1 to 83) and the respective emotion category. Amu=amusement, ang=anger, dis=disgust, des=despair, pri=pride, anx=anxiety, int=interest, irr=irritation, fea=fear, ple=pleasure, rel=relief, sur=surprise, sad=sadness. The mean of the ability distribution was fixed to zero.

123 6.4.2 Construct validity

The descriptive statistics for the ERA tests, the MSCEIT, the cognitive aptitude tests, and the Big Five dimensions and the correlations of these scales with the GERT mean score are reported in Table 3. In line with our predictions, the GERT was moderately positively correlated with all ERA tests, with the highest correlation observed for the MERT total score (r=.516, p<.01). Correlations for the MSCEIT were significant as well, but slightly lower in magnitude, confirming our expectation. The GERT was also significantly positively

associated with all cognitive intelligence measures as expected. With respect to personality, we found a positive correlation with openness, confirming past research (Hall et al., 2009), but in contrast to our expectations, neuroticism was positively and not negatively correlated with the GERT. These relationships can also be seen in the plot of the correlation matrix including all measures in this study (Figure 2). In this figure, scales that are located in the centre share more variance with all other scales, whereas scales located more towards the boundaries can be considered as more specific and less related to the other scales. The GERT is located in the centre of this plot, suggesting that it is most representative of the nomological net of ERA and emotional intelligence as predicted. In line with past research, we also found a significant positive correlation between the GERT and gender, indicating that men overall obtained lower scores than women. The main predictors of the GERT identified with stepwise regression analysis were the MERT total score, the MSCEIT total score, neuroticism, and the JACBART, which in total explained 41% of the GERT variance (see Table 4)7. Although originally correlated with the GERT, the cognitive ability measures and the MiniPONS were

7 The same four variables emerged as the only significant predictors when all 12 variables (test scores and Big Five scores) were entered in the regression together. However, the JACBART turned out to be only marginally significant after the WLS-based recalculation of the final stepwise regression model.

124 no longer significant predictors once the other ERA tests and the MSCEIT were accounted for.

Table 3. Means and standard deviations of the measures used for construct validation and correlations with GERT score.

4 MSCEIT understanding emotions 0.48 0.07 .357**

5 MSCEIT managing emotions 0.38 0.07 .384**

Gender (0=male, 1=female) 0.49 .204*

Note. GERT= Geneva Emotion Recognition Test, MSCEIT= Mayer-Salovey-Caruso Emotional Intelligence Test, MERT= Multimodal Emotion Recognition Test, NV5-R=

Intelligence Battery, JACBART= Japanese and Caucasian Brief Affect Recognition Test, MiniPONS= short Profile of Nonverbal Sensitivity. N=131. Due to technical errors in the online session, data could not be used for 7 participants in the MERT, 3 participants in the MiniPONS, and 3 participants in the MSCEIT. Scale range was 0 to 5 for the Big Five scales and from 0 to 1 for all other measures.

125 Table 4. Final model of the stepwise regression analysis with the GERT score as the dependent variable.

Predictors B SE Beta t(df=116) p

(Constant) 0.19 0.06 3.11 0.002

MERT total 0.39 0.08 0.39 4.95 0.000

MSCEIT total 0.35 0.11 0.23 3.03 0.003

Neuroticism 0.03 0.01 0.27 3.72 0.000

JACBART 0.10 0.05 0.14 1.91 0.059

R2=.433, Adjusted R2=.413, N=121

Note. GERT= Geneva Emotion Recognition Test, MSCEIT= Mayer-Salovey-Caruso Emotional Intelligence Test, MERT= Multimodal Emotion Recognition Test, JACBART=

Japanese and Caucasian Brief Affect Recognition Test.

Figure 2. Correlation plot of all measures included in the construct validation of the GERT.

Note. GERT= Geneva Emotion Recognition Test, MSCEIT= Mayer-Salovey-Caruso Emotional Intelligence Test, MERT= Multimodal Emotion Recognition Test, JACBART=

Japanese and Caucasian Brief Affect Recognition Test, MiniPONS= short Profile of Nonverbal Sensitivity. The numbers correspond to the scales reported in Table 3. Edge thickness indicates the strength of the correlation (the thicker the edge, the stronger the correlation), edge color indicates the direction of the correlation (green= positive, red=

negative).

126 Figure 3. Semantic profiles of selected GERT emotion categories in German and French obtained from the GRID study.

Note. Higher values indicate more positive valence and higher arousal, power, and unpredictability. *<.05, **<.01, ***<.001.

127 6.5 Discussion

The examination of the measurement properties of the GERT using the Rasch model largely confirmed the results found in the previous study by Schlegel et al. (under review). In particular, the GERT showed a similar level of difficulty and acceptable measurement

precision, and was not affected by gender differential item functioning (DIF).

However, although the overall recognition rate was similar in both studies, substantial differences between the two samples were observed for some of the emotions. Given that the participants in the first study were German native speakers, whereas the second sample consisted of French native speakers, cultural differences are a likely reason for the observed discrepancies. Given that age was excluded as a potential reason, two explanations might account for this. First, subtle variations in the expressions associated with certain emotions between French- and German-speaking regions, so-called “nonverbal accents” (Elfenbein et al., 2007), might explain the advantage of the French-speaking sample in recognizing surprise, fear, anger, and disgust, given that the stimuli used were portrayed by actors from the same region. However, the superiority of the German sample in distinguishing sadness and despair speaks against this hypothesis. Second, differences in the semantic meanings of emotion words between the two languages might influence which label is more likely to be chosen for a certain emotion portrayal (Mesquita, Frijda, & Scherer, 1997; Wierzbicka, 1992).

For example, semantic differences between the German and French words for despair might account for the advantage of German speakers in distinguishing sadness and despair. That is, the German word for despair (“Verzweiflung”) might be associated with much higher arousal than sadness (“Traurigkeit”), whereas the French word (“désespoir”) might rather have a connotation of passive hopelessness and thus be more closely related to sadness (“tristesse”) in terms of arousal or perceived power.

128 To follow up on these assumptions, we compared the semantic profiles for the six German and French GERT emotion terms that showed significantly different recognition percentages using data from the GRID study (Fontaine, Scherer, & Soriano, in press). Based on a questionnaire measuring 142 features of emotional experiences, the authors identified four dimensions that reflect the meaning of emotion terms, namely valence, arousal, power, and unpredictability. Figure 3 compares the profiles of the French and German-speaking GRID samples for sadness, despair, anger, disgust, fear, and surprise8. Results showed significant differences in the profiles particularly for sadness, despair, and disgust. As

expected, despair has a meaning associated with higher arousal and power in German than in French. In addition, in French sadness and despair are very similar in their semantic profile, being perceived as low in arousal and power. However, the actor portrayals in the GERT representing despair presumably reflect more arousal than the portrayals for sadness according to the scenarios that the actors had received9. As a consequence, the distinction between despair and sadness might have been easier for German rather than French-speaking participants. We also observed that the term disgust has a more aroused and more

unpredictable connotation in German than in French. Disgust expressions might therefore need to be more intense in order to be labeled correctly for German speakers than for French speakers. Fear, surprise, and anger did not differ much in their meaning between the two language groups.

Based on these exploratory analyses, we can speculate that the advantage of the French-speaking sample in recognizing these latter emotions is not related to differences in

8 We calculated the score of an emotion on a given dimension from GRID participants’ individual ratings. Results from a PCA on the 142 emotion features from the US sample reported by Fontaine et al. (in press) were used to attribute each feature to one of the four dimensions based on the highest absolute loading of the feature. The mean score of all features loading highest on a given dimension was then calculated for each emotion (negatively loading features were recoded).

9 The scenarios can be downloaded from http://www.affective-sciences.org/gemep.

129 the meaning of the emotion terms, but more likely to nonverbal dialects as proposed by

Elfenbein et al. (2007). In contrast, language differences might indeed account for the German sample’s higher accuracy in recognizing sadness and despair. Obviously, further research is needed to replicate and extend these results and to examine the equivalence of test scores between samples from different cultures. However, even though we found differences in the recognition accuracy of some emotions between German and French speakers, this does not necessarily speak against the validity of the GERT in different cultures or language groups, as is discussed in more detail below. Specifically, it should be noted that the confusion patterns for all emotions were similar for the German and French-speaking samples, although the frequencies with which certain confusion categories were chosen differed between the studies (see Appendix C and Supplementary Material of Schlegel et al., under review). For example, fear was confused most often with despair and anxiety in both samples. This suggests that the overall understanding of the portrayals is similar and hence, the same construct of ERA is measured in both groups.

This view is also supported when calculating “family recognition” scores in which participants’ responses are also considered as correct if they wrongly chose an emotion category that belongs to the same emotion family as the correct target category. In the GERT, anger and irritation, anxiety and fear, and sadness and despair each form an emotion family according to theoretical considerations (Bänziger & Scherer, 2010). The mean family

recognition scores for anger (calculated by counting the response options anger and irritation for anger stimuli as correct) are very similar for the German (.97) and French (.99) samples, suggesting that in both samples anger/irritation family is equally well recognized in a coarse sense. Similarly, for despair and sadness, family recognition scores attenuate the difference between the German and French samples. Specifically, the difference in mean values is

130 reduced from .25 to .12 (.84 in the German and .72 in the French sample) for despair and from .20 to .05 (.89 in the German and .84 in the French sample) for sadness. For fear, the original difference of .20 is reduced only slightly to .16 when calculating the family recognition scores, which is due to the fact that the major confusion category for fear, namely disgust, is not considered part of the fear/anxiety emotion family. Taken together, this suggests that in terms of broad family recognition, the GERT might be more cross-culturally stable than when fine-grained recognition scores are considered. It remains debatable whether for fine-grained recognition such stability across different language groups can be achieved at all, given the subtle differences in meanings of the respective emotion terms.

With respect to the second research question in this study, evidence for the construct validity of the GERT was established by examining the associations with similar tests, cognitive aptitudes, gender, and personality. The moderate correlations with all other ERA tests, regardless of the particular modality or set of emotions used in these tests, suggest that the GERT taps the main facets of the broad ERA domain instead of measuring very

specialized aspects. This was confirmed by the fact that the MERT, the MSCEIT, and the JACBART independently contributed to the prediction of the GERT score. The GERT thus seems to capture ERA skills from different modalities (MERT) and more general emotional understanding and knowledge (MSCEIT). The MERT is also the test that resembles the GERT most in the procedure and the stimuli used. The specific contribution of the JACBART might consist in the recognition of particularly brief emotional expressions. Further support for the construct validity of the GERT is provided by the significant association with a range of cognitive aptitudes (general reasoning, attention, and vocabulary knowledge) which suggests that higher ability to maintain mental effort and higher perception speed as well as better verbal abilities contribute to the accurate perception and labeling of emotional

131 expressions. This association became non-significant when controlling for the relationship with other ERA tests in the stepwise regression. Performance in the GERT thus seems to depend partly on cognitive aptitudes as suggested by Murphy and Hall (2010), but not to a larger extent than other ERA tests do.

With respect to personality, our hypotheses were only partly supported. We found a positive correlation between the GERT and openness, suggesting that open individuals attend more to emotional expressions as they are interested in stimulation (Matsumoto et al., 2000).

On the other hand, we found a positive correlation with neuroticism that contradicts Matsumoto et al.’s (2000) argumentation that high neuroticism individuals are avoiding emotions of others because they are prone to experiencing negative emotions. However, Guarino, Roger, and Olason (2007) argue that besides the themes of depression and anxiety, an important component of neuroticism is other-directed emotional sensitivity, which they defined as being affected by other people’s emotions and found to be positively related to ERA. This implies that individuals high in neuroticism might attend more to other people’s emotions and be more affected by them, leading to an advantage in ERA (see also Mayer, DiPaolo, & Salovey, 1990). However, given the sparse empirical evidence and theoretical foundations of the links between ERA and personality, further research is needed.

To summarize, this study adds to the promising evidence for the validity of the GERT.

First, we showed that the psychometric quality obtained in a French-speaking sample is comparable to the results reported by Schlegel et al. (under review) for a German-speaking sample and second, we empirically supported the nomological network expected for this test.

The main limitation of the test to date is the substantial difference in the recognition rates observed for some emotions with respect to the previous study by Schlegel et al. (under review), indicating that although the same construct is measured, the GERT is subject to

132 cultural or language differences. The equivalence of test scores between samples from

different cultures should be investigated in future research to rule out discrepancies in construct validity. To date, almost no studies have examined such differences for standard ERA tests, particularly between different Western cultures. However, we assume that minor differences in recognition accuracy cannot be completely avoided for any test that includes more than only basic emotions, given that the meanings of more fine-grained emotion categories are unlikely to be fully equivalent between languages. From the point of view of cross-cultural research on emotion recognition, using the GERT in future studies can extend existing findings to a) different cultural or language groups and b) to more emotions, given that to date mostly basic emotions have been studied.

Actor number, recognition rates, difficulty parameters and standard deviations, Infit/Outfit indices, and Raju values from gender DIF testing for the 83 GERT items.

Item Actor

Item Actor

Note. M=mean recognition rates, δ= difficulty parameters, SE=standard errors,

amu=amusement, ang=anger, dis=disgust, des=despair, pri=pride, anx=anxiety, int=interest, irr=irritation, fea=fear, ple=pleasure, rel=relief, sur=surprise, sad=sadness. Actors 01, 03, 04, 05, 08 are male, actors 02, 06, 07, 09, 10 are female. The difficulty parameters have been estimated with the mean of the ability distribution fixed to zero. Positive Raju values indicate

amu=amusement, ang=anger, dis=disgust, des=despair, pri=pride, anx=anxiety, int=interest, irr=irritation, fea=fear, ple=pleasure, rel=relief, sur=surprise, sad=sadness. Actors 01, 03, 04, 05, 08 are male, actors 02, 06, 07, 09, 10 are female. The difficulty parameters have been estimated with the mean of the ability distribution fixed to zero. Positive Raju values indicate