Ground-truth definition - Methods for emotion assessment

Chapter 4 Methods for emotion assessment

4.1.1 Ground-truth definition

Defining a ground-truth for the purpose of emotion assessment strongly depends on the protocol used to record emotional reactions. There are actually two ways to elicit emotions (Section 2.3):

- by asking the participant to self-generate a given emotion;

- by stimulating him / her with material containing emotional content (images, sounds, video clips, etc.).

In the first case the annotation is straightforward since a trial could be annotated with the emotion that the participant was supposed to express. However, assumes that the requested emotions were successfully elicited, and thus requires appropriate control of the elicitation procedure.

In the second case, the annotation of each trial (corresponding to a stimulus) can be done by either one of the following three methods:

- by a-priori defining the emotional label of each stimulus;

- by determining the elicited emotion from the observation of the participant’s emotional expressions (for instance facial expressions);

- by asking the participant to self-assess his / her feelings after the stimulation.

The a-priori annotation can be done arbitrarily for instance according to the judgment of the experimenter. However this method is not recommended since it does not take into account the variability of judgments that can be observed in a population. Another possibility is to ask a large population of persons to extensively evaluate the stimuli. Each stimulus can then be associated to the most frequent label or to the average of judgments if emotions were evaluated in a continuous space such as the valence-arousal space. For instance, each of the IAPS images [54] is provided with the mean and the standard deviation of the valence-arousal judgments of 800 persons.

However this method is still not optimal since the participants can have various emotional reactions under the same stimulus, due for instance to differences in past experience. As an example, an image of someone skiing can elicit pleasure but it can also elicit negative emotions for a participant that had a bad experience on skis. To alleviate this problem, it is possible to ask to each participant to select and evaluate images before the experiment. By using this method the elicited emotions are known in advance. The problem is then that the stimuli will be known by the participants and thus elicit different emotional response during the experiment (for instance less intense responses).

Analyzing the emotional behavior and expressions of the participant is a possible method to determine the elicited emotion. However this method requires that at least one expert (a psychologist for instance) is present during the experiment. In order to avoid bias in the estimated emotions, it is even better to group the judgments of many experts, raising the question of the combination of different judgments. The participant should be free to fully express his / her emotions, which is not always the case. For instance, when physiological signals are recorded movements are generally limited to avoid noise in the signals. Finally, if a model is trained from this type of ground-truth then it will be able to detect the expression of the emotion and not the other factors involved in emotions such as the subjective feelings (see Section 2.1.3.b).

Presenting the stimuli to the participants and then asking them to self-assess their feeling is also an alternative to the two precedent annotation methods. This can be done by asking the participants to fill in questionnaires, to give emotional labels and to evaluate their emotions in the valence-arousal space. While this method alleviates the problems related to the subjectivity of emotions it relies on the assumption that the participants are reliable experts in evaluating their own feelings. This latter statement can be discussed because of the following issues. Firstly, the participant will determine the elicited emotions mostly based on his / her subjective feeling which is only one of the factors involved in the emotion (see Section 2.1.3.b). Secondly, misunderstanding of the material used for self-assessment could lead to wrong annotations. As stated in Section 2.1.4, emotional words can have different meanings across persons and cultures.

Moreover, representing an emotion in the valence-arousal space is not straightforward and it is mandatory to provide explanations before someone can use this space. Thirdly, a participant can hide his / her true felt emotional state because of social rules. For instance, a man can hesitate to report high arousal while watching a picture of a nude man.

The best way to annotate data is certainly to combine the different annotation methods described above. An example can be to combine annotations from experts that evaluate an emotional state based on the analysis of several emotional cues (facial expressions, speech, behavior, physiological signals etc.) together with self-reported measures of emotions. This would enable to determine a ground-truth based on several components involved in emotional processes.

However it is not clear how the fusion of the different annotation should be performed.

In this study both self-generated and stimuli-based emotions were elicited in different protocols.

For the protocol where emotions were self-generated, the trials were directly annotated with the corresponding emotional state as explained above. For the protocols where emotions were elicited using stimuli two of the annotation methods described above were employed: a-priori annotation and self-assessment. The effectiveness of those methods was thus estimated based on the accuracy of the emotion assessment.

The above paragraphs detail how to collect a ground-truth without any assumption on the type of annotations that are collected. Those annotations could be continuous (for instance by using the valence-arousal space) or discrete (using emotional labels such as fear and anger). Essentially because of its generality but also for other reasons detailed in Section 2.1.4 the valence-arousal space was chosen as a representation for emotional states. Accurately determining a point in this space based only on physiological features is a difficult task. For this reason we preferred to take a first step by defining valence-arousal classes of interest. Thus each value y_i of the y vector can take values in a set of emotional labels Y={ 1,…, c}, where C is the number of classes.

Depending on the protocol, the valence-arousal space was thus divided in different regions and each region associated to a target label. For instance, three regions of interest (C=3) can be defined by segmenting the valence-arousal space in calm, excited-positive and excited negative areas. Another possibility is to segment the space in two areas (C=2) such as calm vs. excited or positive vs. negative areas. The segmentation can be done a-priori, the participants thus annotate the emotions accordingly to the classes defined; or a-posteriori from continuous annotations gathered during the protocol. All those different possibilities give rise to different y ground-truth vectors and thus corresponds to different formulations of the emotion assessment problem that are called classification schemes. Several classification schemes were studied for the data gathered from each protocol and they will be detailed in the appropriate chapters.

Dans le document Emotion assessment for affective computing based on brain and peripheral signals (Page 88-91)