• Aucun résultat trouvé

Emotion assessment from physiological signals

Chapter 2 State of the art

2.3 Emotion assessment from physiological signals

Over the last years, emotion recognition from physiological signals has received much interest.

The goal is to find a computational model that is able to associate a given physiological pattern to an emotional state. This can be done by several methods discussed in this chapter.

Table 2.4 provides a (non exhaustive) list of relevant studies concerning emotion assessment from physiological signals. Unfortunately, it is difficult to make comparisons between these

studies because they differ on several criteria. Six criteria are introduced below to help the reader gain insight into the current state of the art as well as to discuss important aspects of emotion assessment from physiological signals. The studies are listed in Table 2.4 according to those six criteria.

Number of participants: in a study that includes a high number of participants the results can be regarded as more significant. Another point of importance is to know if the model obtained for emotion identification is user-specific or not. A user specific model avoids the problems related to inter-participant variability but a new model will have to be generated for each new user.

Emotion elicitation: how to elicit emotions from a participant is also a question of significant interest, especially considering the importance of the context as detailed before. Picard [106]

proposed five factors to describe the context in which the emotion are elicited. One of those five factors divides emotion elicitation approaches into two categories: subject-elicited and event-elicited. In the first category, emotions can be generated by asking the participant to act as if he/she was feeling a particular emotion (for instance by mimicking the facial expression of anger) or to remember a past emotional event of his / her life. This method has often been used in facial expression recognition and it has been shown in [5] that it is effective to induce specific peripheral activity. In the second category, it is possible to use images, sounds, video clips or any emotionally evocative task. Several databases of stimuli have been designed for the purpose of emotion elicitation like the International Affective Picture System or International Digitized Sound system (IAPS, IADS)[54]. These databases are generally accompanied by affective evaluations from experts or average judgments of several people. However, since past experience plays a key role in emotion elicitation, it can also be important to ask the user to report and self-assess his / her feelings. Emotion elicitation is also influenced by the number and the complexity of the targeted emotions.

Time: temporal aspects are also relevant for emotion recognition but have only received little attention. According to [20], it is possible to define some time categories that range from the “full blown emotions”, lasting for some seconds or minutes, to the traits, lasting for years if not all the lifetime. In between are categories such as moods or emotional disorders. In human computer interaction, most of the applications under consideration deal with what Cowie defines as “full blown emotions” thus managing phenomena that last from seconds to minutes. In an ideal affective interface, the emotion of the user should be detected as soon as possible (let’s say in few seconds) in order to take the proper decision that directly matches the user expectation and not one that was expected minutes before. Synchronization of the different modalities is also an issue since the activation of physiological outputs can occur at different time scales. For instance, a change in temperature of the skin is much slower than a change in brain activity occurring a few milliseconds after emotion elicitation.

Ref. Number of

Signals / sensors Emotion classes Methods (classifiers, feature selection, etc.)

6 The physiological activity of each participant was recorded for only one or two of the emotional conditions. The final number of participant for each condition is given in the “Emotion classes” column.

Ref. Number of

Signals / sensors Emotion classes Methods (classifiers, feature selection, etc.)

Fear, anger, neutral LDA (first session as training set, second as test

EEG, BVP, GSR Joy, anger, sadness, fear, relaxation

Ref. Number of

Signals / sensors Emotion classes Methods (classifiers, feature selection, etc.)

Table 2.4. List of publications on emotion assessment from physiological signals. Signals acronyms are: Electromyography (EMG), Electrocardiogram (ECG), Galvanic Skin Response (GSR), Electroencephalography (EEG), Blood Volume Pulse (BVP). Classification acronyms are : Sequential Floating Forward Search (SFFS), Linear discriminant analysis (LDA), Support Vector Machine (SVM), Mean Square Error (MSE), Multi Layer Perceptron (MLP), K-Nearest Neighbors

(KNN), ANalysis Of Variance (ANOVA).

Sensors / modalities: as described in chapter 2.2 various sensors can be used to measure physiological activity related to emotional processes. However, sensors used for emotion assessment should be chosen carefully so that they do not perturb the user. First, sensors should not be uncomfortable for the user in order to avoid parasite emotions like pain. Secondly, they should not prevent the use of classical modalities for instance by monopolizing the hands of the user. When physiological sensors are used for affective computing they switch from the standard status of sensors to the concept of modalities that communicate information about the emotional status of the user. It is then necessary to merge these modalities to perform emotions assessment in a reliable way, taking into account redundant and complementary information such as the relation that exists between HRV and respiration [117, 118].

Emotion models / classes: as discussed in chapter 2.1, several representations of emotions are available. Section 2.1.4 discusses their usability and extensions for the purpose of emotion assessment. In order to define a computational model that links physiological reactions to affective states, it is necessary to record physiological activity together with a ground-truth (i.e.

the true elicited emotional state). However, constructing an emotional ground-truth is a difficult task because emotional states are influenced by several components of the organism as explained in Section 2.1.3.b. Several methods can be used to construct a ground-truth, for instance by defining the emotional state a-priori (generally based on precedent studies or evaluations of the stimuli) or by asking the participants to self-report their feelings. Depending on the methods used, the truth can be quite different. The advantages and disadvantages of the ground-truth construction methods are discussed in Section 4.1.1.

Methods: a wide range of methods has been used to infer affective states. Most of them are part of the machine learning and pattern recognition techniques. Classifiers like k-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), neural networks, Support Vector Machines (SVM’s) and others [119, 120] are useful to detect emotional classes of interest. Regression techniques [120] can also be used to obtain continuous estimation of emotions, for instance in the valence-arousal space. Prior to inferring emotional states it is important to define some physiological features of interest. It is very challenging to find with certainty some features in physiological signals that always correlate with the affective status of users. Those variables frequently differ from one user to another and they are also very sensitive to day to day variations as well as to the context of the emotion induction. To perform this selection researchers generally apply feature selection or projection algorithms like Sequential Floating Forward Search (SFFS) or Fisher projection.

As can be seen from Table 2.4 there are large differences in classification accuracy even for studies that employ the same number of classes. Although this can be partly explained by the factors detailed above, we believe that such variations mostly result from: the differences in

emotion elicitation strategies, the type of physiological signals (modalities) used, and the chosen model of emotions (or emotional classes).

For emotions elicited by an event, the best results obtained on more than three classes are those presented in [93] and [111]. One interesting point is that even though these studies use different stimuli (film clip versus music) they both use a method to control for the validity of the elicited emotions. In [93], the film clips presented were chosen according to the results obtained in a pilot study. In this pilot study, several participants were asked to evaluate clips by stating the emotion they felt as well as its intensity. In [111], the authors worked with the Augsburger Database of Biosignals (AuDB), the participants where asked to freely choose music that matched the desired emotions. Both of these methods insure that emotions felt during the experiment are intense and correspond to the expected ones. The good results obtained in [106] using a self induction protocol also tend to confirm the importance of reliable elicitation. In [106], the participant was the experimenter herself so that the emotions to be induced were perfectly clear to her. Moreover, the participant remembered past emotional events for emotion elicitation; this implies a strong intensity since remembered events are generally those that have induced intense feelings.

Diverse types of physiological activity measurements from both the peripheral and the central nervous system have been used. Up to now, most of the studies using brain activity for emotion assessment have used EEG signals with unconvincing results [36, 115]. One could conclude that EEG signals in the present state of the art are not effective for emotion assessment; the present thesis however will argue against this. Describing the state of the art for emotion recognition based on peripheral signals is a real challenge because most studies performed classification on feature sets that include features from many types of signals, thus preventing analysis for single modalities. However, there are signals that are employed in nearly all studies, like GSR and HR (extracted from ECG or plethysmography). These two signals are known to correlate well with affective states [5, 7, 92]. In many results it can also be seen that EMG signals are relevant for emotion assessment [6, 111, 112, 116, 121]. Generally, EMG electrodes are positioned to measure facial muscle activity. Muscles that are often recorded include: the venter frontalis (raising of eyebrows), the zygomatic (smiling) and the corrugator supercili (frowning of eyebrows). Since emotional states are often associated with facial expressions, measuring facial activity is strongly relevant to assess emotions. However having EMG, electrodes on the face is quite uncomfortable which could hamper their usage in a concrete application.

In order to assess emotions using physiological signals it is necessary to extract features from those signals that are relevant to the problem. In the current state of the art, most of studies [93, 106, 107, 122] consider physiological signals as independent Gaussian processes and thus extract statistical features such as mean, standard deviation and sometimes moments of higher order (skewness and kurtosis). However, it is likely that this independency assumption does not hold

since physiological signals are influenced by identical systems such as the ANS for peripheral signals. Moreover, correlations exist between the different signals because they are modulated by a single reaction to emotions. For instance both finger blood pressure and skin temperature are influenced by vasoconstriction. To our knowledge, there are no studies that try to consider those interactions at the feature extraction level. Most of the studies perform feature selection after feature extraction with the goal of removing redundant features and keeping relevant and synergetic ones. Other features frequently extracted in the temporal domain are statistical features (mean, standard deviation, etc.) of the first and second derivative of the signal [87, 93, 106, 113, 123]. Apart from temporal features, features from the frequency domain are also frequently extracted especially for HR, respiration and EEG [6, 109, 110, 115, 122, 124].

One of the most obvious observations that can be made from Table 2.4 is that different models of emotions lead to different classification accuracies. This is especially clear when comparing the basic-emotions models, which generally include more than three categories, to emotions in the valence-arousal space model, including two or three categories. Thanks to the works of Wagner [111] and Picard [106], it is possible to compare results on valence-arousal classes to those obtained on basic emotions classes in an intra-study framework. The conclusion emerging from this comparison is that the accuracies reported with the valence-arousal representation are lower than the accuracies reported with basic emotions, considering that the number of classes is different. A possible explanation for this is that acquisition protocols were designed for basic emotion elicitation while valence-arousal classes were defined by grouping basic emotions into two sets of classes (high vs. low arousal and high vs. low valence). This coarse definition can lead to confusing classes. For instance, grouping bliss and joy in the same positive class is prone to errors since peripheral activity for bliss, which is a rather calm feeling, is certainly different to the one of excited joy. However, results from other studies indicate that classification in the valence-arousal space is often associated with lower accuracies than when using basic emotion labels. Moreover, identification of valence classes is generally harder than identification of arousal classes (Table 2.4), which supports the idea that peripheral activity has a higher correlation with arousal than valence [7]. No clear differences in accuracy can be observed between the studies using different labels and number of classes to recognize basic emotions.

Only some of the studies listed in Table 2.4 performed emotion recognition online [25, 114, 125].

The others performed acquisition of the signals in a well controlled experimental environment and, in a second step, applied emotion recognition algorithms offline making use of cross-validation methods to determine the effectiveness of their system on unseen data. In [114] the authors first trained a model to associate physiological patterns to emotions and then applied the same model to detect emotions during the presentation of images from the IAPS. As can be seen from Table 2.4 the obtained accuracy on three classes (neutral, negative and positive) is quite

high (71%) demonstrating the feasibility of online emotion recognition. However, only one participant was tested with online detection preventing further conclusions concerning the reproducibility of this approach. Moreover, recordings and online classification where also performed in experimental conditions that are far from ecological and real application conditions.

Another example of online emotion assessment is [125] where the authors designed a fuzzy emotion detection system for the purpose of studying emotions during game play. However, no online accuracy is reported for their system. Instead, the authors directly use the assessed emotions to compare different game play conditions. One of the most relevant studies concerning online emotion recognition is certainly [25] where pleasure (low vs. high) is assessed from the physiological signals of autistic children in order to adapt the behavior of a robot arm playing with them. The objective is to construct a robot able to change its gaming strategies as a psychologist would do. In this case the authors reported an average online accuracy of 81.1%

when the two assessed emotional classes where compared to a ground-truth based on psychologist judgments. In this study, emotions were assessed while the children were interacting with the robot thus showing the possibility of emotion assessment for applications in a real HMI environment.