• Aucun résultat trouvé

Reliability and validity of standardized emotion recognition tests

2.6 Measurement of emotion recognition ability

2.6.1 Reliability and validity of standardized emotion recognition tests

In this section, I will review the psychometric quality of interpersonal sensitivity and emotion recognition tests. Generally, the psychometric quality of a test or questionnaire can be

considered as high when the instrument is valid, i.e., when it measures what it is supposed to measure, and when it is reliable, i.e., when it captures true individual differences instead of measurement error or noise. Different types of validity have been proposed in the literature. Some of the most commonly evaluated validity concepts are ecological validity, construct validity (subsuming convergent, discriminant, and factorial validity) and predictive validity (Kane, 2008).

While the predictive validity of emotion recognition ability tests was reviewed in the previous section, below I will examine the reliability, ecological, and construct validity of these tests.

Reliability. Several indicators can be distinguished when assessing the reliability of a test.

First, test-retest reliability refers to the stability of the measurement over time and reflects to which extent individuals maintain their scores when tested again. Adequate test-retest reliability has been reported for several tests of emotion recognition ability. For example, for the MERT test-retest reliability was .78 over six weeks (Bänziger et al., 2009). For the DANVA, Nowicki and Duke (1994) reported test-retest reliabilities of about .80. Second, reliability can be assessed in terms of internal consistency, reflecting to what extent items of a test are correlated and how homogeneous the test is. The most common measure of internal consistency is Cronbach’s Alpha. As noted by Kenny (in press), Hall, Halberstadt, and O’Brien (1997), and Hall et al. (2005), the majority of interpersonal sensitivity tests have low internal consistency (below .50) according to psychometric standards. This led some researchers to assume that individual differences in interpersonal

sensitivity are rather limited (Kenny & Albright, 1987). In contrast, others have argued that low internal consistency does not necessarily impair the validity of a test. For example, Hall et al.

(2005) and Scherer and Scherer (2011) have argued that items in such tests might measure quasi-independent, discrete aspects of a higher-order interpersonal sensitivity construct which is defined by the component variables. According to this reasoning (see also Bollen & Lennox, 1991)

including a variety of distinct items, each representing a somewhat different skill, in a test may actually increase content validity. Within the domain of interpersonal sensitivity, the tests with the

best internal consistencies are emotion recognition tests. For the JACBART, DANVA, and the Emotion Perception subtest of the MSCEIT acceptable Cronbach’s Alphas of about .75 to .85 have been found (Matsumoto et al., 2000; Nowicki & Duke, 1994; Brannick, Wahi, & Goldin, 2011). On the other hand, for other recent tests, namely the ERI and the MERT, Cronbach’s Alphas were not reported in the respective publications (Scherer & Scherer, 2011; Bänziger et al., 2009).

Ecological validity. A test can be considered as ecologically valid when the measurement approximates the real-world setting that is being examined and when it is replicable under conditions that are typical for the population at large (Brewer, 2000). With respect to emotion recognition tests, ecological validity can refer to the stimuli (emotion expressions) and the task (choose the expressed emotion) used. As can be seen from Table 1, most tests partly or exclusively use still pictures of faces as stimuli. As several researchers have argued, photographs of

prototypical facial expressions might have low ecological validity as they are much less complex than emotional cues typically conveyed in real-life situations (Plesa-Skwerer, Faja, Schofield, Verbalis, & Tager-Flusberg, 2009; Isaacowitz & Stanley, 2011). Similarly, vocal stimuli that have been obtained through electronic filtering or randomized splicing like in the PONS might be

perceived differently than natural speech. Hall and Schmid Mast (2007) and Elfenbein and Ambady (2002a) concluded that ecological validity of emotion expressions would be maximized if visual and aural channels (face, voice, body) are presented holistically since this mode is closest to the setting in which people make inferences about others in real life. However, only few tests include such multimodal emotion expressions.

In addition, the ecological validity of some emotion recognition tests might be restricted as they contain only few emotion categories, typically basic emotions. As Frank and Stennett (2001) have noted, this allows test-takers to choose their response based on discrimination and exclusion rules rather than actually recognizing the correct emotion, which seems to be the more natural process. Especially regarding happiness, which is the only positive emotion in most tests,

test-takers can make the correct choice as soon as they recognize a smile. Phillips and Slessor (2011) have therefore argued that future studies should include also more complex emotions such as shame, guilt, contempt, pride, or hope, and blended emotions (p. 281). For example, when studying emotion recognition in older age, this approach would increase the applicability of emotion

recognition problems to everyday life.

A third limitation of most emotion recognition tests with respect to ecological validity is that they use posed or enacted emotion expressions that were produced by actors. These

expressions are typically perceived as exaggerated and less natural. While some tests in the broader interpersonal sensitivity field contain recordings of spontaneous behavior (e.g., the Interpersonal Perception Task; Costanzo & Archer, 1989); or the Communication of Affect Receiving Ability Test; Buck, 1976), within the emotion recognition domain only the recent SERT (Kang, 2012) is based on spontaneous emotion expressions. This might be related to the fact that, when studying emotion recognition, researchers need to have more control over the features of the stimuli, such as the specific emotion category, the intensity, and the duration of the display. To collect a set of spontaneous stimuli matching these criteria seems to be very difficult, especially in terms of defining the “correct” emotion in each expression.

Factorial validity. Factorial validity refers to the dimensional structure of a test, obtained with factor analytical methods, which should correspond to the theoretically hypothesized nature of the underlying construct. Very little is known about the dimensional structure of emotion

recognition tests, as most relevant publications do not report respective analyses. Unfortunately, theoretical accounts that could guide empirical analyses are lacking. Given that some of the tests in this domain show low internal consistency, it can be speculated that attempts to conduct factor analysis would often lead to uninterpretable results. It thus remains unclear whether the same underlying ability is involved in accurately decoding a) cues for different emotions, and b) cues in different modalities. As an exception, Rosenthal et al. (1979) examined the intercorrelations

between scores on the different modalities of the PONS test (face, voice, body) and found them to be relatively orthogonal. More recently, Bänziger et al. (2009) concluded that scores on the different modalities in the MERT were related to some extent, but that most individuals are specialized in certain modalities and less proficient in others, making it worthwhile to consider modality scores separately. With respect to the question of emotion specificity in emotion

recognition ability, only one study can be found. Suzuki, Hoshino, and Shigemasu (2010) found the recognition of happiness to be independent from the recognition of negative emotions from faces.

However, this study did not investigate one of the standardized tests described above, but used a paradigm developed specifically for that study.

Construct validity. Two central aspects of construct validity are convergent and discriminant validity. Convergent validity means that measures of the same or similar constructs should be substantially positively correlated, whereas discriminant validity is demonstrated through low or zero correlations with measures of theoretically independent or distinct constructs (Sartori & Pasini, 2007). More generally, construct validity can be established by developing a nomological network in which relationships between constructs are specified on theoretical grounds and then are

empirically tested using the respective measures (Cronbach & Meehl, 1955). With respect to

interpersonal sensitivity, traditionally low correlations between different measures have been found, suggesting that the different skills under the umbrella of interpersonal sensitivity are unrelated to each other. As Hall et al. (2005) noted, these low correlations might be a result of weak

psychometric qualities of the measures involved. Alternatively, these researchers suggested that different measures reflect different facets of the broad interpersonal sensitivity domain and collectively define a higher-order construct. The correlations between more recent emotion

recognition tests such as the DANVA, JACBART, ERI, and MERT tend to be higher (around .30), which might reflect the similar content (i.e., emotion expressions) and task in these tests (Bänziger

et al., 2009; Bänziger et al., 2011). These results support the convergent validity of tests of emotion recognition ability.

With respect to discriminant validity, many studies have investigated the relationship between interpersonal sensitivity and cognitive intelligence. While nonverbal sensitivity, when measured with performance-based tests, is likely to contain a general reasoning component, researchers have assumed that it should be a distinct ability from intelligence (Murphy & Hall, 2011). In their meta-analysis, these authors found a small, but significant relationship between interpersonal sensitivity and intelligence, supporting the discriminant validity of the construct.

Many studies also examined the broader nomological net of interpersonal sensitivity, providing additional evidence for construct validity. As summarized by Hall et al. (2009), interpersonal sensitivity is positively related to empathy, social and emotional competence, extraversion, and openness. In contrast, interpersonal sensitivity was negatively related to shyness, neuroticism, and anxiety. These findings are generally in line with the assumption of interpersonal sensitivity being a valuable skill for successful social functioning.

2.6.2 Problems of existing emotion recognition tests and requirements for a new test