• Aucun résultat trouvé

Sequential Analysis of Communicative Behaviors – Methodological Issues

T- Pattern statistiques by clusters

T-Pattern statistiques by clusters

Enjoyment cluster 

The positive emotions cluster is characterized by higher ratings on the joyful, entertained and enthusiastic scales. Out of the five clusters, the enjoyment group contains the smallest number of video samples. It is composed of 14 clips (7%) out of the original 200.

This is no surprise since the experimental protocol was designed to elicit mainly negative emotion narratives. In this cluster, we have found 134 different t-patterns in total. The length of the t-patterns in the experimental data varied from 2 to 6 behavior units, with a mean of 2.7 and a mode of 2.

In contrast the mean number of t-patterns detected in randomized and shuffled data was 5, with a maximum length of 2 behavior units.

99 This figure shows the number of t-patterns detected in the positive emotion cluster and the mean number detected after 10.000 thousand shuffling (blue bar) and random rotations (red bar) of the same data.

The significance of the difference between the numbers of independent patterns found in the experimental data compared to those found in the simulated data is computed on the basis of the distance, expressed here in standard deviations, between the number of

“experimental” patterns and the distribution of the simulated values around their mean. The large number of standard deviations (>23SD) reported in figure 4 strongly suggests that the t-patterns in our cluster are not the results of chance effects for a p-value set at 0.001.

Following the selection procedure described above, 24 independent patterns (18%) were kept for further analysis.

Hostility cluster 

The hostility cluster combines video-samples that were rated high on the “disgusted”,

“angry” and “scornful” adjective scales. It is the second largest group in terms of number of video samples (N=54) included in the analysis, between the “sadness” (N=55) and

“embarrassment” (N= 51) clusters. These 54 files represent 26% of the entire corpus. The program detected 3903 t-patterns in the original dataset. The pattern length distribution varies between 2 to 7 behaviors units around a mean of 3.99 and a mode of 4.

100 In contrast, the simulations do not produce any random patterns that have more than three behavioral units. The differences between experimental and simulated data are much smaller for short length patterns of two events compared to those with three units (>300 SD).

Even then, the difference is still large enough, above nine standard deviations, to consider these patterns for further comparison with other clusters.

101 Out of the 3903 initial t-patterns detected by THEME, 227 (6%) met the EMFACS and sequentiallity criterion and were kept for further analysis.

Embarrassment cluster 

The embarrassment cluster combines the “embarrassed” and “nervous” scales and joins together 51 files; 26% of the core set. In total, 2069 t-patterns have been detected.

Pattern length distribution in the experimental data varies between 2 to 6 behaviors units around a mean of 3.4 and a mode of 4

Interestingly, in the rotated and shuffled data the program detects almost no t-patterns.

The mean number of patterns after the application of the shuffling procedure is 0.5 for sequences of two events. After rotation it falls below 0.5. Note also that the standard deviation values are very large: 687 for shuffling and 1.250 for rotations. Again, we can be confident

102 that the associations of events found in the t-patterns of the embarrassment cluster cannot be explained by chance effects.

Out of the 2069 original t-patterns detected by THEME, only 101 (5%) met the criteria for further inclusion in subsequent analysis.

Surprise cluster 

The surprise cluster combines the “perplexed” and “surprised” scales. After the positive emotions cluster it is the least populated group in our datasets with 26 video files.

This represents 13% of all the files in the core data set. Length of patterns is distributed between two to five events; with a mean and a mode of 3.  

103 The number of patterns found in the simulated data does not exceed three events.

Regardless of the simulation procedure applied, the program finds a mean of 2 patterns of maximum two events in length. With three events that figure falls below a 0.5 mean value.

The computed distance between the experimental and simulated data is respectively 58 and 45 standard deviations for the shuffling and rotation methods. Conclusions about the validity of t-patterns detected in previous groups holds for the “surprise” cluster. Out of the 241 original t-patterns detected by THEME, 27 (11%) presented the necessary features for further inclusion in subsequent analysis.

104 Sadness cluster 

The sadness cluster is composed of the “disappointed” and “sad” scales. With 55 video records, this cluster contains the largest number of files from the core data set (28%). The number of independent patterns detected is also the largest out of all the clusters. The program found 7605 t-patterns in total. Pattern distribution varies between two to nine behavioral units in length. The mean length is 4.5 with a mode of 4.

105 Patterns validity assessment points to a mean number of 19 random patterns detected in sequences, two events in length. With differences between “real” and simulated data above 30 standard deviations for both simulation methods, the t-patterns found in the experimental data seem still valid for two events sequences. The t-pattern selection for this cluster led to a drastic reduction of the number of sequences to consider for further analysis. Only 2% of the original 7605 patterns have been kept (N=117).

Summary of results 

At this stage, the examination of the general characteristics of the t-patterns found in the five clusters, yield the following informations. First the number of t-patterns found in a cluster is in a linear relationship with the number of files composing a cluster. Nonetheless, the proportion of t-patterns involving at least one EMFACS action decreases when the number of sample files rises. We proposed two non-mutually exclusive possible explanations for this phenomenon. The first concerns the type of coding involved. EMFACS codes are

“event” type codes whereas a majority of the additional non FACS codes are what we call

“state” codes. By definition, “event” codes are scored based on their frequency of occurrences and vary from one file to the other. On the other hand, “state” codes are scored positive on all the sample files. Only the frequency of transition states from one modality of a variable to another varies across clusters. Second, the frequency of transition states is sensitive to the time scale most characteristic for a specific variable. We argued that some variables, like eye or head movements, can be so rapid and pervasive that their frequency of occurrences

106 increases dramatically, compared to less frequent and longer lasting facial actions, with the number of files involved in a cluster. We think that these two factors combined, are the main reason why the proportion of t-patterns with EMFACS codes decreases when cluster’s size increases. Considering that we decided to keep only the patterns that included at least one EMFACS code, we were able to retain, depending on the cluster, from 2% to 24% of all the t-patterns originally detected by THEME. The second selection filter for t-patterns to be considered for further analysis implied that events in a pattern needed to be composed of events that are sequenced in time rather than simultaneously occurring. This second filtering procedure did not have a dramatic impact on the proportion of t-patterns to be dropped.

Indeed, 71% to 87% of the t-patterns containing a core EMFACS action showed a sequential structure. The statistical validity of the patterns found in the clusters was estimated by comparing the number of t-patterns found in the experimental datasets with the mean number of patterns found after applying two randomization procedures implemented in THEME.

Results show that for all the clusters, the difference between the number of patterns detected in the “real” data and the mean number of patterns found in the simulated data is typically great, somewhere between 9 to 1250 standard deviations. Note that patterns longer than µ=3 are not found in either kind of randomized data, while in the real data patterns up to length 9 are detected. Even with patterns between 2 and 3 in length, the difference between real and randomized data is always large enough to suggest that the patterns detected by the program cannot be explained by chance effects. In the next section, we will examine the compositions of the sequential patterns containing EMFACS actions putting an emphasis on the t-patterns that stand out as most specific for each cluster.

107