• Aucun résultat trouvé

Structural analysis of temporal patterns of facial actions: Measurement and implications for the study of emotion perception through facial expressions

N/A
N/A
Protected

Academic year: 2022

Partager "Structural analysis of temporal patterns of facial actions: Measurement and implications for the study of emotion perception through facial expressions"

Copied!
282
0
0

Texte intégral

(1)

Section de psychologie Sous la direction de Susanne Kaiser

STRUCTURAL ANALYSIS OF TEMPORAL PATTERNS OF FACIAL ACTIONS:

MEASUREMENT AND IMPLICATIONS FOR THE STUDY OF EMOTION PERCEPTION THROUGH FACIAL EXPRESSIONS

THESE

Présentée à la

Faculté de psychologie et des sciences de l’éducation de l’Université de Genève

pour obtenir le grade de Docteur en Psychologie

par Mr. Stéphane WITH

de Genève

Thèse No 455

GENEVE

Mars 2010

(2)

INDEX

I.  FOREWORD ... 1 

II.  THEORETICAL INTRODUCTION ... 6 

Prototypical Facial Expressions of Emotions ... 7 

Issues of Ecological Validity of Emotion Recognition Studies ... 14 

The Componential View on Facial Expressions of Emotions ... 19 

Summary ... 23 

III.  EXPERIMENTAL SECTION ... 26 

Research aims ... 27 

Collecting samples of dynamic emotional facial expressions ... 27 

The MeMo database ... 27 

Emotional narratives eliciting task ... 27 

Participants to the narration tasks (production study) ... 28 

Laboratory and interview settings ... 29 

Assessment of emotional induction ... 29 

Extracting video sample files from the original films ... 30 

Assessing the message value of spontaneously expressed dynamic displays of emotions ... 32 

Judgment task ... 33 

Participants to the judgment task and rating protocol ... 33 

Reliability analyses ... 34 

Principal Components Factor Analysis ... 35 

Clustering of video files on factor scores ... 39 

Methodology of behavior annotation ... 41 

The Anvil annotation tool ... 41 

Coding scheme ... 41 

Measurement of facial activity ... 42 

Additional Nonverbal Codes ... 45 

Speech and Voice Codes ... 46 

Scoring procedure and reliability assessment ... 47 

Results ... 50 

Descriptions of scores in database ... 52 

Interpretation of FACS Codes with EMFACS/FACSAID ... 56 

Methodological issues in measuring the co-occurrences of Action Units ... 59 

(3)

Measuring Co-ocurrences of Facial Actions with GSEQ (Generalized Sequential

Querier) ... 62 

Contingency table statistics with GSEQ... 62 

From Odds ratios to Yules’s Q ... 64 

Testing for the occurrence of EMFACS predicted facial patterns using Yules Qs ... 65 

Relating nonverbal signals to emotion perception ... 70 

Relative frequencies of single action units across the rating clusters ... 70 

Clusters characterization by patterns of action units ... 71 

Prototypical Patterns of Facial Expressions across the Clusters ... 74 

Results for prototypical expressions across the clusters... 74 

Happiness ... 74 

Anger ... 75 

Fear ... 76 

Surprise ... 77 

Sadness ... 77 

Disgust ... 78 

Contempt ... 79 

Masking smiles (blends of smiles with displays of negatively valenced emotions) 80  Summary of results ... 86 

Sequential Analysis communicative behaviours – Methodological Issues ... 88 

Definition of T-patterns. ... 90 

Statistical validation of T-patterns... 92 

Setting up T-patterns detection parameters ... 93 

T-patterns search results and selection criteria ... 94 

T-Pattern statistics by clusters ... 98 

Enjoyment cluster ... 98 

Hostility cluster ... 99 

Embarrassment cluster... 101 

Surprise cluster ... 102 

Sadness cluster ... 104 

Summary of results ... 105 

T-patterns illustrations and comparison by clusters ... 107 

Enjoyment cluster ... 108 

Hostility Cluster ... 112 

Embarrassment Cluster ... 123 

(4)

Surprise Cluster ... 129 

Sadness Cluster ... 134 

Summary of findings ... 141 

IV.  GENERAL DISCUSSION AND CONCLUSIONS ... 143 

General Discussion ... 144 

MeMo - a new research database of dynamic facial expressions of emotions ... 145 

The perceived message value of dynamic facial expressions of emotions ... 145 

The role of sequential patterns of communicative actions in the perception of emotions ... 146 

Limitations ... 147 

Future perspectives ... 148 

Conclusions ... 150 

V.  REFERENCES ... 151 

Appendix I. Facial Action Coding System Figures and Definitions of Major Action Units . ... 163

Appendix II Frequency Distribution Tables for Event Types in Rating Clusters ... 170

Appendix III. Transition Graphs for T-patterns. ... 194

Appendix IV. Instructions and questionnaires. ... 258

Appendix V. Consent Form ... 269

Appendix VI. Normality tests for Action Units Distribution in Database. ... 271

(5)

1

I. I . F F OR O R EW E WO OR R D D

The human face provides a rich source of information that we use to identify other members of our species, gather information about gender, age, attractiveness (Rhodes, 2006) or personality traits. Besides static signals extracted from the face, dynamic facial expressions are an important means of communicating and understanding other’s intentions and affective reactions (Keltner and Ekman, 2000). Facial displays have been associated with the signaling of emotions and pain (Ekman, 1993) the communication of empathic understanding (Bavelas et al. 1986) the regulation of conversations (Cohn and Elmore, 1988). They may signal some brain dysfunctions (Rinn, 1984), psychopathological conditions (Ellgring and Gaebel, 1994;

Benecke and Krause, 2005), suicidal intent (Heller et al. 2001). They have also been showed to signal developmental changes in children (Yale et al. 1999; Yale, Messinger, Cobo-Lewis, 2003) inform person recognition (Cohn et al. 2002) and betray attempts at deceit (Frank and Ekman, 2003).

In recent years, we have witnessed the rapid emergence of an interest for the automated analysis and interpretation of facial activity through computer vision. Computer vision is the science of extracting and representing meaningful information from digitized video and recognizing perceptually meaningful patterns. In 1992, the U.S. National Science Foundation convened a seminal interdisciplinary workshop on this topic (Ekman, Huang, Sejnowski, & Hager, 1992), which brought together psychologists with expertise in facial expression and computer vision scientists with interest in facial image analysis. Since then, there has been considerable research activity, as represented by a series of six international meetings beginning in 1995, devoted to the topic face and gesture. Several automated facial image analysis systems have been developed (Cootes, Edwards & Taylor, 2001; Essa &

Pentland, 1997; Lyons, Akamasku, Kamachi, & Gyoba, 1998; Padgett, Cottrell, & Adolphs, 1996; Wen & Huang, 2003; Yacoob & Davis, 1996; Zhang, 1999; Zhu, De Silva, & Ko, 2002). Most of those systems have in common to attempt classifying facial movements in a small set of specific emotion categories, such as joy, anger surprise, fear or happiness. Of course the potential economical stakes linked to the development of such technologies are high. Possible commercial applications include notably the development of cameras taking pictures of your friends and family only when they oblige you with a smile, computer tutoring systems adapting your learning gradients depending on your perceived level of frustration, or artificial agents attuning their reactions to your nonverbally expressed emotions. In the post

(6)

2 9/11 U.S funds have been granted by Homeland Security for the training of behavioral profilers for screening airport clients for potential terrorist intents. In the August 14 2009 issue of the Ottawa Citizen newspaper, reporter Ian MacLeod writes:

«Beginning next year, some air travellers will be scrutinized by airport

"behavior detection officers" for physiological signs of hostile intent — in other words: screening for dangerous people rather than just for dangerous objects……… Similar programs operate in the United States, the United Kingdom and Israel, which pioneered spying on people's expressions and body movements for involuntary and fleeting "micro-expressions" and movements suggesting abnormal stress, fear or deception. This might indicate a passenger has malicious intentions," said Mathieu Larocque, spokesman for the security authority, which is responsible for pre-board screening of airport passengers.

"It offers an additional security layer for the aviation system."

In some airports passengers’ voices are already being screened by machines for signs of stress, when asked to answer questions about terrorist intents. On the Digital Civil Rights in Europe website (www.edri.org) one reads1:

“Lie detectors will be used in Russian airports as part of the security measures starting with July 2006. Meant to identify terrorists or other types of criminals, a lie-detecting device developed in Israel, known as "truth verifier," will be first introduced in Moscow's Domodedovo airport as early as July. The technology…. is said to be able to detect answers coming from imagination or memory.”

In the United Kingdom, local social institutions have introduced voice stress analysis to detect fraudulent benefactors. For example journalist Les Reid reports (2009):

«A lie detector system designed to root out benefits cheats in Coventry has identified 1,200 dodgy claims in just over a year. The technology detects stress levels in people's voices over the phone and has been used by the city council to assess new Housing Benefit claims since November 2007.Council bosses say

1http://www.edri.org/edrigram/number4.7/liedetector

(7)

3 the Voice Risk Analysis (VRA) system has dramatically sped up the time taken

to process at least 1,700 genuine claims, saving money on paperwork. (Reid, 2009). »

Amongst the numerous TV soaps praising the feats of scientific polices so common these days; one especially attracted my attention in 2009. It is called “Lie To Me”. Produced by FOX television the series’ main character is called Dr Cal Lightman. A scientific expert in reading emotions from the face and detecting subtle signs of deceit from nonverbal behavior;

Dr Lightman rents his services to U.S. federal agents and private parties to help resolve different sorts of investigations. Freely inspired by the life of psychologist and facial expressions expert Paul Ekman, this show can boast to have renowned experts like Dr Erica Rosenberg, a long time collaborator of Ekman, as scientific advisor. All this seems to show that the current Zeitgeist is somehow ripe for accepting the spread of human and automated technologies of behavior profiling, possibly as a necessary evil in return for a safer society.

And for those of us who still think they can get away with cheating, the message is clear. You will be caught one way or the other. All this is well and good but how much of this is based on sound science and reliable technologies?

In 2008, Noldus technology historically commercialized the first ever facial emotion expressions analysis software called “FaceReader”2. It is based on a classification algorithm developed by Den Uyl and Van Kuilenburg (2005). By curiosity, Susanne Kaiser and I invited to our lab a commercial agent from the distributing company to test the system on video records collected for the present thesis. I recall feeling amused, relieved and frustrated at the same time while watching the program detect “anger” whenever the participant in the video was frowning; “surprise” if her eyebrows rose and “happy” if she happened to smile. I was amused to see how unreliable and arbitrary these interpretations were; I was relieved because it comforted me that I had not spent the last three years and half cautiously, and I must say painfully, annotating facial actions manually when a computer program could have done it just as well in no time (although I still have hopes that this will soon become at least partially

2http://www.noldus.com/human-behavior-research/products/facereader

(8)

4 possible). Eventually, I was frustrated and upset at seeing how promising technological advances were put to such uninformed use.

Surprising as it may be to computer engineers, behavioral scientists still have many unanswered questions as to even what an emotion is, and what are the essential nonverbal features necessary to communicate affective mental states in social contexts The present thesis is an attempt to investigate some unsettled issues about the perception of emotions from facial expressions. Even though in real life facial expressions are inherently dynamic, the vast majority of claims about how emotions are communicated through facial patterns come from recognition studies using static photographs of prototypical expressions posed by actors. Even though some questions about the way these expressions are perceived have already largely been addressed, not much is known about their frequency of occurrences in naturalistic contexts and therefore what their contribution to communicating emotions in social interactions really amounts to. In this study we will put a special emphasis on the relationship between how spontaneously produced facial actions unfold in time and how they are perceived in terms of the emotional messages they convey. Moreover, because faces are usually not perceived in isolation but are integrated with numerous additional nonverbal signals in a multi-channel communicative system, we will also consider head and gaze movement/orientation as well as some speech and voice related variables as communicative signals that may potentially play a role in moderating the message value derived from dynamically perceived facial expressions of emotions.

The study described in this thesis can be divided in four sections. First, we will begin with a theoretical introduction questioning the validity of generalizing results of traditional emotion recognition studies to emotional signal processing in real life situations. Second, we present our strategy to collect spontaneous facial expressions produced during an emotion sharing task. We will describe a new audio-video database called MeMo created for the need of this thesis. MeMo is constituted of 200 sample files extracted from 50 face to face semi- structured interviews conducted with 10 female participants narrating five emotional autobiographic episodes each. All the extracted video sample were pre-selected on the basis of agreements between two independent judges asked to identify sequences in the interviews were participants appeared “emotional”. All the facial actions occurring in these files were then annotated with the Facial Action Coding System as well as with additional codes designed specifically for this study. Second we report the results of a multi-scalar rating judgment study conducted to assess if a) independent judges could agree on the message

(9)

5 value of the emotions (if any) communicated in the sample files, b) if on the basis of these judgments, we could create distinct clusters of video records with similar rating profiles to conduct quantitative comparisons of the facial and related nonverbal actions contained in these groups. In section three, we report our attempt to compare groups of videos with distinctively different rating scores on five emotional factors, in terms of both traditional prototypical patterns of facial configurations and dynamic patterns of multimodal communicative actions detected by the T-pattern detection algorithm developed by Magnusson (2005). An emphasis will be put on the different types of information derived from these two types of analysis. Finally, in the discussion section, the main results of the study are reviewed and potential implications for future research are discussed.

(10)

6

II I I. . TH T H EO E OR R ET E T IC I CA AL L I IN NT TR R OD O DU UC C TI T IO ON N

(11)

7

Prototypical Facial Expressions of Emotions

In psychology, the prevalent conception about emotions assumes that there exist a limited number of fundamental or basic emotions that are neurologically pre-wired and that are expressed through distinctively recognized facial expressions. Initial support for this view started to emerge in the early 1970s with the first research reports suggesting that a limited set of emotion terms could be matched above chance level with static photographs of faces of individuals posing emotions corresponding to six English affective terms: happiness, fear, anger, surprise and sadness. Later work led to the addition of a seventh emotion: “contempt”

to the list (Ekman and Friesen, 1986; Ekman and Heider, 1988 – see figure 1. for illustration).

The results of these early research projects inspired by the neo-Darwinian theories of Tomkins (1962, 1963) and conducted by psychologists Paul Ekman (1972, 1994; Ekman et al., 1987) and Caroll Izard (1971) across various literate and illiterate cultures having had little contact with each other have revived an interest for the search of universal invariants in the way facial displays communicate emotions, at a time where socio-constructivist models viewing emotional behavior as determined solely by cultural influences on expressive prescriptions were predominant (Lutz & White 1986; Wierzbicka, 1994).

Since these princeps studies replications by other research groups have generally produced congruent results with the original findings. Notably a meta-analysis by Elfenbein and Ambady (2002) performed on 87 articles; describing 97 separate studies on cross-cultural emotion recognition supports empirical claims made in favor of cross-cultural recognition of emotions, suggesting that certain core components of facial expressions of emotions are universal. The conclusion that prototypical facial displays can be reliably and cross-culturally associated with predicted emotion labels or appropriate emotion eliciting scenarios is usually taken as evidence that at least some facial patterns function as innate and phylogenetically evolved signals for the communication of emotions. Based on his empirical work and theoretical intuitions of Tomkins, Ekman proposed a neuro-cultural account of emotions (1972) positing a dual influence of psycho-physiological and socio-cultural mechanisms as explanatory causes for both “universal” invariants and “culture” specifics in facial displays of emotions. The neuropsychological component of the model posits the existence of facial affect programs or FAPs that are automatically activated when an emotion is triggered. These FAPs are essentially hypothetical neuro-motor programs triggered during an emotion episode and considered responsible for organizing the full facial response patterns distinctively

(12)

8 characteristic of a small set of fundamental or “basic” emotions. The very notion that there exist a limited number of emotions sharing a more basic or fundamental status compared to others is essentially derived from Ekman’s interpretations of the implications to be drawn from the thesis of a universally recognized set of facial emotional expressions.

The neuro-cultural model views it as a sine qua non requirement for the inclusion of an affective state in the category of emotions, that evidence be produced that this state possesses a distinctive expressive signal that can accurately be recognized cross-culturally.

When such evidence can be provided it is taken as strong suggesting evidence that the label referring to the emotion emerged as a semantic transliteration or rendering of some naturally preexisting, phylogenetically evolved and innate response. Additional indirect evidence cited in favor of phylogenetically evolved «facial affect programs» comes from observed similarities between human and nonhuman emotional displays (Chevalier-Skolnikoff, 1973;

Darwin, 1872/1965; Redican, 1982, Parr, Waller, Vick, & Bard, 2007; van Hooff, 1972;

Waller & Dunbar, 2005), the mutual recognition of emotional signals across species boundaries (Itakura, 1994; Linnankoski, Laasko, & Leinonen, 1994). Note however, that strictly speaking these studies provide evidence for the existence of cross-species similarities in the forms and functions of communicative signals; not that they are related to the theoretical construct of emotion.

 

Figure 1. Prototypical expressions for seven « Basic »  emotions according to Paul Ekman’s predictions (1994). 

From left to right: Surprise, Anger, Disgust, Fear,  Sadness, Happiness, and Contempt. The expressions  have been generated with the FACSGen tool (Roesch  and al 2006). 

A recent study also showed that congenitally, non-congenitally blinds and sighted athletes photographed while receiving medals during Paralympics and Olympic Games

(13)

9 produce similar prototypical expressions of emotions seemingly excluding the possibility of social modeling of the expressions (Matsumoto & Willingham, 2009; but see Mistschenka, 1933 for contradictory observations). Also recent dissection work on 18 human shows that independently of the fact that facial musculature is far from consistent between individuals in terms of both presence and symmetry (McAlister, Harkness, & Nicoll, 1998; Pessa, Zadoo, Adrian, Yuan, & Garza, 1998; Pessa et al., 1998); muscles essential for producing prototypical facial displays of emotions vary little among individuals. In this study, all examined cadavers were equipped with the facial muscles necessary to produce the required actions, almost always exhibited these muscles bilaterally, and exhibited minimal size asymmetry. In contrast, muscles non essential for the production of facial prototypes of emotions showed inconsistency in presence, and were often asymmetric in presence and size (Waller, Cray, Burrows, 2008). This explains how universally recognized facial expressions can be produced even in light of individual variation in facial musculature. In a 1999 text promoting the construct of basic emotions Ekman univoquely restates:

“I have gone back and forth on the question of whether or not a universal signal is the sine qua non for emotion. Once again, I will set out that claim, as a challenge for someone to identify states which have all the other characteristics (of emotions)... but no signal. To date there is no such evidence and I doubt it will be found. I believe it was central to the evolution of emotions that they inform conspecifics ….about what is occurring inside the person. What most likely occurred before to bring about that expression, and what is most likely to occur next. (Ekman, 1999).

Paul Ekman often refers to himself as the upholder of the ideas of Charles Darwin’s exposed in the book: The Expression of Emotions in Men and Animals (1872, 1996). In the 1996 edition of the book published by Oxford University Press, Ekman even authored, an extensive commentary of Darwin’s text based his own views of basic emotions. Interestingly enough, attentive readers of Darwin have highlighted the fact that the notion of basic emotions and prototypical expressions were probably foreign to the mind of Darwin. For example, Michel Heller states:

“Darwin aimait tellement décrire avec minutie ce qu’il observait, qu’il n’aurait jamais pu se contenter de réduire les expressions émotionnelles à quelques traits, ou à quelques émotions de base. Ce qu’il inclut dans sa liste des expressions émotionnelles est à la fois varié à l’extrême et hétérogène. Il

(14)

10 aurait plutôt tendance à croire qu’il n’a pas pu tout analyser, et que la réalité

est encore plus différenciée que ce qu’il est parvenu à décrire.” (Heller, 2008).

Whatever opinion one may hold about Ekman’s arguments, the principal strength of his work has probably been to reaffirm Darwin’s propositions that emotions are an appropriate object of study for the natural sciences. Even though Ekman puts a strong emphasis on the biological determinants of facial expressions of emotions, he does not altogether rejects the importance of culture as is reflected in the second part of his model’s name. The neuro-cultural model acknowledges that both cultures and institutions promote implicit and explicit expectations which are meant to influence the ways in which emotional episodes are being acted out in the interpersonal arena. The notion of cultural display rules was first introduced by Ekman and Friesen (1969) as a hypothetical construct to explain the observed differences of facial expressive styles in a study comparing Japanese and American students. Since that time, the notion of display rules has become a central concept in the study of culture and emotion. Cultural display rules can be defined as culturally prescribed rules, which are learnt early in life through socialization. These rules influence the emotional expression of people from any culture depending on what that particular culture has characterized as an acceptable or unacceptable expression of emotion (Matsumoto, Kasri, &

Kooken, 1999).

These culturally shared norms dictate how, when, and to whom people should express their emotions. Note, that in keeping with the notion of a set of universal emotions communicated through prototypical facial patterns, the concept of display rules does not extend to the initial shaping of an emotion display per se. The “how”, in the sentence:

“how….people should express their emotions” is not meant to refer to the canonical form of the emotional expression, considered an innate and phylogenetically inherited pattern produced identically across individuals and cultures. Rather, display rules are inferred from the operation of modulation strategies of an expressive response already triggered by an emotion. Several modulation strategies meant to alter the supposedly natural course of an expression have been described such as: acting out of an « unfelt » emotion, as in social or polite smiling; trying to suppress the expression by activating counteracting muscles, minimizing, or maximizing the amplitude and or duration of a response, and also masking negative displays with social smiles. Interestingly, the same meta-analysis by Elfenbein and Ambady (2002) that seems to confirm a minimal universality in the recognition of core elements of facial expressions of emotions also provides non-accounted for evidence that

(15)

11 emotional expressions may lose some of their meaning across cultural boundaries. For example these authors report that some facial displays are more accurately understood when they are judged by members of the same national, ethnic, or regional group that had expressed the emotion.

According to Ekman’s model, this in group advantage effect ought to be explained by a shared knowledge of culture specific display rules by individuals of the same culture. This interpretation of the in-group advantage in decoding emotions facially expressed was challenged by two subsequent studies that explored not only how emotions were perceived but also how they are produced cross-culturally (Elfenbein et al. 2007). Surprisingly little research has examined cross cultural differences in actual (not self-reported) emotionally expressive behaviors (the most often cited of these studies—Ekman, 1972—was not published in a peer-reviewed journal). In the study by Elfenbein and colleagues participants from Quebec and Gabon were asked to pose facial expressions of emotions. Group specific displays in the form of pattern activation of distinct muscles for the same expression emerged most clearly for serenity, shame, and contempt and also for anger, sadness, surprise, and happiness, but not for fear, disgust, or embarrassment. In a second study, Quebecois and Gabonese participants were asked to recognize these expressions as well as expressions standardized to erase the cultural specificities. Results showed that an in-group advantage emerged for non standardized expressions only and most strongly for expressions with greater regional expressive specificities. These authors have interpreted these results as suggesting the existence of nonverbal dialects showing cultural variations similar to linguistic dialects, thereby decreasing accurate recognition by out-group members.

From the early 1970s on to this day, theoretical claims in favor of the existence of basic emotions, depend largely on convergence of results from cross-cultural studies where participants are asked to judge pre-selected displays from static faces. Typically, links between facial expressions and self reported emotional experience are at best moderate (Rosenberg, 2005). In a recent review of 257 published papers covering the 1997-2007 period, Eva Bänninger (2009) showed that studies on facial expressions were dominated by either judgment or production studies (N=158, 61%). Only 38 (15%) combined the measurement of actual facial behavior with impression formation. This easily produces a problem of circularity since production studies usually rely on coding systems and interpretation tables derived from the results of these judgments studies to select which behavior to observe and subsequently how to make sense of their data.

(16)

12 Ekman’s arguments in favor of the universality of a small set of basic emotions, signaled by corresponding patterns of facial expressions, has not gone unchallenged by alternative psychological models of emotions. For example according to Russell’s dimensional model, facial expressions are not seen as expressing distinct emotions, nevertheless observers do infer much about the expresser from the face (Carroll and Russell, 1996).

According to this view one can extract two kinds of information from the face easily and automatically. First, quasi-physical information is perceived. That is, the observer can see from the eyes whether the expresser is weeping, winking, looking down or up, staring at something, or looking way. The mouth shows whether the expresser is talking, shouting, yawning, laughing, smiling, or grimacing. Carroll and Russell (1996) refer to such information as quasi-physical to indicate its simplest literal meaning. Thus, as quasi-physical, the smile is recognized simply as a smile—not whether it is a smile of joy, of embarrassment, of nervousness, or a polite greeting. Second, based in part on perceived quasi-physical features, the observer infers the expresser's feelings on the general dimensions of pleasantness and arousal. Any further differentiation of a displayed emotion is considered to be inferred from additional contextual cues according to Russell’s dimensional model.

According to Frijda’s definition, emotions as essentially states of action readiness;

facial expressions are seen as reflecting an intention to act in a certain way. For example, using Ekman’s facial prototypes of basic emotions Frijda showed that participants could reliably associate the displays with particular states of action readiness. For example disgust and fear were associated with the tendency to “avoid” and “protect oneself”, happiness with the desire to “approach” and “be with” (Frijda and Tcherkassof 1997).

Even though their theories differ in what they predict Ekman’s prototypical expression should signal; what those researchers share in common is the use of recognition studies to back up their particular claim as to what an emotion is. This may prove inappropriate since the only thing that these researches convincingly show is that not only emotions but, levels of arousal and hedonicity, cognitive appraisal as well as action tendencies can also be inferred from prototypical facial emotional expressions (see Scherer and Grandjean, 2008). This suggests that results from recognition studies may be compatible with several theoretical models and thus inadequate for testing the competing theories. As for the investigation of correspondences between self reported feelings and theoretically predicted facial displays inconsistent results come up. Some researchers report a weak (Bonanno & Keltner, 2004;

(17)

13 Frijda & Tcherkassof, 1997; Kappas, 2003) to moderate link (Rosenberg, 2005). For instance, Fernandez-Dols, Sanchez, Carrera and Ruiz-Belda (1997) have found no coherence between the subjective reports of the participants watching emotion-eliciting movies and their facial expressions. As an example, two participants have displayed a prototypical expression of surprise while reporting feeling disgust.

(18)

14

Issues of Ecological Validity of Emotion Recognition Studies

Emotional signals in social interactions are typically not conveyed by specific facial patterns alone but by a complex combination of rapidly changing and overlapping individual facial actions integrated with other nonverbal cues. The well established empirical evidence demonstrating that a limited number of discrete emotion categories can be cross-culturally recognized from facial configurations alone, rest on the results of emotion recognition studies using static photographs of posed expressions presented in isolation and pre-selected for maximum discriminability (Barret, Lindquist and Gendron, 2007).

The generalization of these recognition data to emotional signal processing in more realistic social contexts is questionable on several grounds. First, most of the standardized facial stimuli used in laboratory experiments (most often of Matsumoto & Ekman’s, 1988, JACFEE set or Ekman & Friesen’s, 1976, pictures of facial affect) were produced by actors who followed strict guidelines detailed in the « Directed Facial Action Task » protocol (Ekman, 2007) on how to pose each facial expressions corresponding to Ekman’s prototypical set of basic emotions.

By contrast, naturally occurring facial expressions are often of weaker intensity, less clear cut and their interpretation more elusive and ambiguous than posed expressions (Nummenmaa, 1992, Hess and Kleck, 1994, Russell, 1997). Indirect evidence to this fact is that drastic drops or even disappearance of inter-rater’s agreement for specific emotion labels have been reported when spontaneously produced facial expressions instead of posed expressions are used (Motley and Camden, 1988, Motley, 1993, Yik, Meng and Russell, 1998).

Standardized posed expressions are more easily recognized than spontaneous ones probably because they act as super stimuli by exaggerating the features of the emotion type they depict. As Ekman (1972, 1989) stresses it, they possess a ‘“snapshot quality”’ that fosters instant recognition. Second, several studies using well established facial coding systems like FACS to specify the configurations of spontaneously produced expressions report little evidence for the existence of specific prototypical expressions predicted by proponents of

(19)

15 basic emotion theories (Matias, and Cohn, 1993; Camras and al. 2002, Scherer and Ellgring, 2007a).

Third, in real life situations, facial displays are only one component of an integrated complex multi-channels, multi-signals communicative system, where additional components provide a possible context for modulating the perceived meaning of facial displays. Despite this fact, most research on emotion recognition has focused on isolated modalities, mostly facial and vocal, at the expense of other communicative channels. The few studies having investigated the combination of facial displays with additional nonverbal signals suggest that head orientation (Hess, Adams and Kleck, 2007), body postures (Aviezer and al., 2008), head positions (Krumhuber, Manstead and Kappas, 2007), and gaze orientation (Reginald and Kleck, 2005) all have a modulating impact on the meaning derived from facial displays. For example, the role of horizontal head tilt for the perceptions of facially expressed emotions was examined by Hess, Adams and Kleck (2007). Head position was found to strongly influence reactions to anger and fear but less so for other emotions. Direct anger expressions were more accurately identified, perceived as less affiliative, and elicited higher levels of anxiousness and repulsion, as well as less desire to approach than did averted anger expressions.

Conversely, for fear expressions averted faces elicited more negative affect in the perceiver.

The authors conclude that their findings suggest that horizontal head position is an important cue for the assessment of threat. Additionally Reginald and Kleck (2005) have demonstrated that the way in which gaze direction influences emotion perception actually depends on the specific type of emotion in question. They show that direct gaze enhances the perception of anger; whereas averted gaze enhances the perception of fear expressions.

These patterns of findings are explained according to the perspective that emotional expressions and gaze behavior communicate basic behavioral intentions to approach or avoid.

Thus, when congruent in signal value, gaze direction acts to enhance the perception of the emotion communicated by the face. Gaze direction influences anger and fear perception because it indicates the source of threat, as part of an early warning mechanism, whereas for joy and sadness, gaze may simply be a social signal indicating a tendency for social engagement. In this example, averted gaze may enhance the perception of fear because it helps indicate the source of potential threat via joint attention (see Driver et al., 1999), whereas averted gaze may enhance the perception of sadness because it indicates social withdrawal.

(20)

16 Keltner (1995) has been the first to provide evidence that two distinct emotions sharing the same facial signal of smiling could only be accurately differentiated on the basis of additional behavioral cues. He found that when people report embarrassment, they show a consistent pattern of behavior distinct from that of amusement. When embarrassed, people look down, then smile and simultaneously attempt to control the smile with facial actions that are antagonistic to the upwards pull of the zygomatic muscle, turn their head away and then touch their face. Follow up research has shown that observers are able to discriminate displays of spontaneous embarrassment and amusement. This suggests that an important part of the embarrassment signal might the sequential unfolding of its multimodal component actions.

The same emphasis on the temporal unfolding that was useful in differentiating different kinds of smiling could also be of interest to understand if and how observers attribute different emotional values to morphologically similar facial actions depending on the their sequential organization and pairing with other nonverbal actions. In the same line as what Keltner (1995) showed with embarrassment, recent work suggests that positive affect states that were not previously considered as basic emotions can possibly be identified by specific pattern of expressive actions (Shiota, Campos, & Keltner, 2003).

For example, Tracy and Robins (2004, 2008) found that an action pattern involving a small smile, a head tilted back, the arms raised or akimbo with hands on hips, and visibly expanded posture could be reliably interpreted as an expression of pride. Because they were able to reproduce these results cross culturally, these authors argue that the term « pride » should be added to the traditional list of basic emotions.

Another under-investigated hypothesis is that in social conversations, the verbal communication of the circumstances and evaluation of a situation serve to reduce the uncertainty inherent to some facial expressions and constrain their meaning to allow for quick categorization of emotion (Lindquist and al., 2006). Indirect evidence in favor of this hypothesis is suggested by the fact that, when given the opportunity, judges invent plausible eliciting scenarios when presented with prototypical emotional expressions (Frijda and Tcherkassof, 1997). Therefore, it only takes a small leap to assume that when an actual eliciting event is known, it will be taken into account in interpreting someone's facial expressions.

Finally, when it comes to produce relevant empirical data about how emotions are perceived through the face, traditional judgments studies using static stimuli do not capitalize on the fact that the natural dynamic component of facial expressions provides unique

(21)

17 information about a sender’s mental state that is not available in static displays (Ekman, 1982). In natural settings, the face moves and shifts, sometimes even quickly, from one expression to another. In other words, observers in natural environments observe social signals conveyed by the face not as static stimuli but as complex action patterns unfolding in time.

Thus, the sequential unfolding of facial actions provides observers with different information than the ones provided by static photographs, since still expressions do not present subtle changes. It may be that differences in the social information displayed by static and dynamic expressions lead to differential effects on emotion perception.

Indeed, preliminary investigations suggest that the dynamic aspects of facial displays are likely to be of importance (Bassili, 1978, 1979; Buck, Miller, & Caul, 1974; Hess &

Kleck, 1990; Kamachi et al., 2001). For example, Edwards (1998) has shown that observers are sensitive to subtle changes in a person’s facial expression. When asked to assess the temporal progression of emotional facial displays, the participants were able to detect extremely fine dynamic cues. It led the author to assert that facial expressions of emotion are temporally structured in a way that is both perceptible and meaningful to an observer.

The relevance of temporal aspects has also been stressed in a research conducted by Wehrle et al. (2000) on emotion perception for schematic facial expressions. The results support the claim that dynamic displays improve the recognition and differentiation of the facial patterns of emotions as compared to static displays (see also, Ambadar, Schooler, and Cohn, 2005; Bould and Morris, 2008, Lemay, Kirouac, and Lacouture, 1995). Evidence is starting to accumulate concerning the importance of dynamic parameters on observer's categorization of subtle facial expressions judgment of genuineness (Krumhuber and Kappas, 2005) and trustworthiness (Krumhuber and al., 2007).

The relevance of the relative timing emerged from studies showing that humans were sensitive to the duration of a facial display when considering the sincerity or deceptiveness of an emotional display (Ekman, Friesen, and O’Sullivan, 1988). Ekman and Friesen (1982) have suggested that social or polite smiles are sometimes obvious because of their short onset and irregular offset times which convey a lack of authenticity. Cohn and Schmidt (2004) have shown that spontaneous smiles have smaller amplitude and present a more linear relation between amplitude and duration than deliberate smiles. Hess and Kleck (1990) have also pointed out the importance of the dynamics of facial movements, and particularly the irregularity, or phasic change, in the expression’s unfolding.

(22)

18 Thus, pauses and stepwise intensity changes, for example, the number of onset, offset and apex phases that the expression contains, came out as significant parameters. (See also Frank, Ekman, and Friesen, 1993; Hess, et al., 1989; Messinger, Fogel, and Dickson, 1999).

All of these researches point to the possibility that perception of emotions from static and dynamic facial stimuli might involve distinct cognitive processing strategies and that until recently researchers in behavioral sciences may have seriously underestimated the importance of context and motion dynamics for making sense of subtle or otherwise ambiguous facial expressions that permeate real life situations. In the next section, we will introduce an alternative theoretical account of the relationship between facial displays and emotions that take into account the unfolding of facial expressions in time in its formulation.

(23)

19

The Componential View on Facial Expressions of Emotions

Recognition studies of emotions have been largely limited to a set of 7 (±2) prototypical facial patterns and minor variants on them. At the same time, Ekman (1972) and Izard (1994) acknowledged that each emotion is associated with more than one facial pattern.

Indeed, Ekman and Friesen (1978) originally listed 55 facial patterns for six emotions (30 for anger, 8 for sadness, 6 each for fear and disgust, 3 for surprise, and 2 for happiness; this count ignores possible variations in head and eye movements and variations in degree of mouth opening. Several predictions for incomplete prototypes have been proposed by Ekman and Friesen in their 2003 book: Unmasking the Face – A guide to recognizing emotions from facial expressions. This text contains detailed descriptions of partial prototypes centering on the brow, eye or mouth region. Multiple patterns for a single emotion raise a conceptual problem: Which pattern occurs in a given instance of the emotion and why? For example, from the 6 variants predicted for disgust: what on a specific occasion determines which one of the 6 actually occurs? Furthermore, facial expressions outside the predicted set of 55 also may also occur. If so, Ekman and Friesen's (1978) analysis may not specify the full set of patterns that an observer will attribute to a specific emotional category. Nonetheless, Ekman (1980) was clear that all the patterns for a given emotion should be quite similar.

One characteristic trait of the still facial images provided by Ekman and colleagues is that they show global patterns. The typical facial expressions used in most recognition studies is the result of different muscles acting to move the brows, eyelids, cheeks, and mouth converging simultaneously to their maximum point of contraction. Ekman and Friesen (1978) developed a system of analyzing a facial display into its constituent movements, called action units (AUs). To illustrate, figure (3) shows how the how prototypical facial pattern of anger (figure 2) can be decomposed into four different facial actions or AUs: AU4 (lowering the brow) AU5 (raising the upper eyelids), AU17 (raising the chin) AU23 (pursing the lips)

(24)

20 Figure (2)

AU4+5+17+23

AU4 AU5 AU17 AU23

Figure (3). The four individual action units that constitute the pattern shown in figure (2).

There is an alternative account of facial behavior that does not predict one specific facial pattern for each emotion and thus can explain the existence of multiple patterns. Indeed, it raises the possibility of even more diversity, including the frequent occurrence of no facial action, single actions, and small combinations of muscular group. This alternative is commonly referred to as “componential theories”. The central claim of the componential view of facially communicated emotions is that single elements of facial expressions might convey meaningful information’s at more molecular levels than full blown prototypes (Smith and Scott, 1997).

Componential accounts of facial expressions of emotions are derived from appraisal theories of emotion (for a review see Scherer, 2001). Appraisal theories claim that emotions are elicited and differentiated by conscious and/or nonconscious evaluations of events and situations. Although different appraisal theories vary with regard to both the number of

(25)

21 appraisal dimensions and their exact definitions, there is substantial overlap (Frijda, 1986;

Ortony & Turner, 1990; Russell, 1997; Scherer, 1984; Smith, 1989; Smith & Scott, 1997).

Amongst appraisal theories Scherer’s multi componential model of emotions or CPM for short presents a particular interest because it provides specific predictions linking individual facial actions with appraisals dimensions (see table 1. extracted from Wehrle et al., 2000). According to that model, the temporal orders in which individual facial actions unfold are seen to reflect an individual’s ongoing cognitive processing of emotionally relevant stimuli (Scherer, 1992, 2009). The CPM’s appraisal categories can be broadly categorized into four classes: appraisals related (1) to the intrinsic properties of the stimulus, such as novelty and pleasantness; (2) to the significance of the event for the individual’s needs and goals; (3) to the individual’s ability to cope with the consequences of an event; and (4) to the compatibility of the event with social and personal norms and values. According to appraisal theory, it is the subjective evaluation of an event as pleasant or unpleasant, conducive, or obstructive to one’s goals, as changeable or not, and as compatible or incompatible with social and personal norms that determines the type of emotion that is experienced.

Thus, an event that is appraised as pleasant and goal conducive elicits joy, whereas one that is appraised as goal obstructive and as difficult or impossible to redress elicits sadness (Scherer, 1992). In the CPM, an emotion episode is the result of a momentary synchronization of functionally distinct components, including cognitive appraisals, subjective feelings, physiological changes, pre-motor activation preparing for action and facial expressions.

No one component will be common to all instances of any one type of emotion, and each component can function independently of any other and in the absence of any emotional feeling. If facial movements are the direct outcomes of an appraisal process, an emotion is therefore expressed in the face only indirectly, through its correlation with the other defining components.

To illustrate, let us return to Figure 2. Component theory posits that several AUs are concomitant with a cognitive appraisal. In the first slide of figure 3, the brows are lowered and brought together in an action called AU4 according to the FACS system. The CPM predicts that this could signal that the person is appraising an event as unexpected, unfamiliar, unpleasant or as obstructing his/her goals. The rising of the upper eyelid referred to as AU5 in the second slide of figure 3, could reflect an attentional response to a sudden change in the environment.

(26)

22 Note that for Smith and Scott (1997) the appraisal component can initiate a facial expression even in the absence of any underlying emotional feeling. On the other hand a feeling of anger dissociated from the activation of additional components, would produce no facial behavior at all. Put more formally, components are necessary and sufficient for facial action; emotions are neither necessary nor sufficient for facial action. According to this, prototypical patterns of facial actions can arise only secondarily, through the coincidental combination of two or more dimensions of the appraisal process. Table 2, provides some predictions for facial action patterning corresponding to the basic categories identified by Ekman. Of course, these are just hypotheses to illustrate the componential view of facial expression. Even though some encouraging data have been reported (see: Lanctôt and Hess, 2007; Aue and Scherer, 2008; Delplanque and al. 2009) most of the details remain to be established empirically.

Nevertheless, it opens up for the possibility that partial expressions that would be seen as meaningless when considered in isolation could in fact still be quite informative when preceding and following actions are taken into consideration. Authors in this tradition componential approach have called for more research on the temporal dynamics of facial expressions (Kaiser and Wehrle, 2008) and possible combinations with other expressive modalities (Scherer and Ellgring, 2007b).

(27)

23

Summary

Besides speech, we use notably facial expressions, gaze direction, vocal modulation and body segment orientation to interact with others. One characteristic of human communicative abilities is to combine actions from different behavioral modalities into specific patterns that involve either some temporal overlap or sequence. For example, a vocal emphasis on a word might begin and end within a bilateral rising of the brows; or a gaze at a person’s face might contain a smile that is followed by downwards movements of the head and eyes. To date little attention has been given to the temporal sequence in which facial actions unfold and how they are coordinated with head and eye motions. Such coordinated patterns may be perceived as communicating specific emotional meanings, but relevant research is scarce. In this thesis, we attempt to provide detailed examples of the ways dynamic facial expression of emotions are produced and perceived. By extracting and representing the sequential unfolding of facial and other nonverbal actions during spontaneous emotional displays, nonverbal analysis can begin to discriminate among the message values of otherwise undetected features of expressive actions. This is a critical step if we are to move from simple prototypical expression recognition to the interpretation of dynamic and naturally occurring expressions.

(28)

24

Table 1. Scherer's Componentional model predictions linking cognitive appraisals with facial group muscles activation. Retrieved from Wehrle et al. 2000. NoveltyHighLow Sudenness 1 + 2 + 5 + 26 / 27 / 38 / Gazing atNA Familiarity NA4B + 7 Predictability NA4B + 7 HighLow Taste 6 + 12 + 25/26 Taste (9 + 10) + 16 + 19 + 26 Sight 5A + 26A Sight 4 + 7/(43)/44 + 51/52 + (61/62) Smell 26A + 38 Smell 9 + (10) + (15 + 17) + (24) + 39 Sounds 12 + (25) + 43 Sounds: any combinations of the others Goal Significance Goal RelevanceHigh Focusing responses: lower intensity of the cumulation of the 2 first SECsLow: NA Outcome probabilityProbable: higher intensity for future responsesNot probable: lower intensity for future response ExpectationConsonant: NADissonant:reactivation of novelty response 1 + 2 + 5 or 4B + 7 ConducivenessConducive: 6 + 12Obstructive: 4C (long) + 7 + 17 + 23/24 UrgencyUrgent: intensification; high tensionNot urgent: Deamplification - low tension Causality AgentSelf or non human: less intense than external personal attributionOther person: intensify future responses, more intense than self or non personal MotiveNon intentional: diminution of intensity of existing and future responseIntentional: intensify existing and future response: More intense than not intentional Coping potential ControlLow: 15+25/26 +43B/43C/ +54 +61/62+64 or 1+4High: 4+5 or 7+23+25 PowerLow: 20+26/5High: NA AdjustmentLow: holding the existing patternHigh: Deamplification Standards CompatibilityAchieve, comply or surpass standardsFail to achieve or violate standards SelfSelf: 17+24(+53)Self: 14 / 43A/43C/43C+54+55/56+61+62+64 Self: 17+24(+53)Self: 41/42/43+54+55/56+61/62+64 Other: Direct gaze at, 1+2+5+26Other: 4+10U+(12)+ (53+64) / 12U /14UOutcome B Intrinsic pleasantness OtherAppraisal dimensionOutcome A

(29)

25

Emotion Prototype Predicted Appraisal1 Predicted Sequence2 Unfam./Discrepant 4+7

Disgust Unpleasant 9,10,15,39

High Coping 23+25, 17+23,6+17+24

Sudden 1+2

Discrepant 4+7

Obstructive 17+23,17+24

High Coping 23+25, 17+23,17+24

Sudden 1+2

Fear Unfam./Obstructive 4+7

Low coping 20,26,27

Discrepant 4+7

Sadness Obstructive 17+23,17+24

Low coping 20,26,27

Sudden 1+2

Happiness Pleasant 12+25

Conducive 6+12

1. Predicted Appraisal: Antecedents postulated by CPM. 2. In this colomn left, centre and right alignement are used to suggets the relative temporal position of the indicated action units (+: simultaneous AU / ,: alternative AU)

Anger

Table 2. Components Processes Model - Facial Action Units prediction for five modal emotions. Derived from Scherer (2001)

(30)

26

I I I I I I . . E E X X P P E E R R I I M M E E N N T T A A L L S S E E C C T T I I O O N N

Références

Documents relatifs

In conclusion, here we show that the visual preferences of 3.5-month-old infants to smiling versus neutral faces are interactively shaped by visual, social experience with faces

In particular, the proposed algorithms and the neural network in connection with stacked denoising autoencoders, using eigenvalues of decomposition of output data,

The results of the study also suggest that the combination of motion segmentation technique, methodology of geometric morphometrics, and modified principal component

At the cognitive level, the construction of an interpretation of a sentence requires not only combining the meaning of words and phrases followed by computing their

Keeping the same design as in experiment 1, in experiment 2 we replaced the stimuli of emotional facial expressions with emotional body postures; experiment 3 repeated the same

Unlike most of the previous studies, the proposed emotion generation model was integrated with the Linear Dynamic Affect Expression Model (LDAEM), which is an emotional

Parmi elles, 7 arrivent avec deux amis et les autres avec trois amis. Il est donc

Thus, we computed the interaction between the Odor context, Expression, and Group factors with two categories of facial expres- sion: positive expression (i.e., happiness) and