General method and goals - G ENERAL INTRODUCTION

1 G ENERAL INTRODUCTION

1.4 General method and goals

1.4.1 Machine learning as a hybrid computational approach

The classic computational approach has been relatively faithful in operationalizing appraisal theories.

Many CMAs have incorporated emotion causation by interactional appraisal input (cf. C2), componential emotional output (e.g., avatars with emotional expressions; cf. C1), and time dependence (cf. C3). Direct simulation of emotion processes (e.g., in a virtual environment with avatars) has proven useful for testing the plausibility of the underlying emotion theories, and for exposing hidden assumptions (Marsella et al., 2010). Three important disadvantages of the classic computational approach are that models (a) are typically not developed to test appraisal hypotheses, (b) tend to be overdetermined by theoretical assumptions, and (c) often cannot be compared formally due to incompatibilities between different architectures.

In general, these three disadvantages reflect the adverse tendency for computational models of appraisal to become splinter theories of their own (see Marsella et al., 2010, p. 22, Figure 1.2.1, for connections among theories and models): As noted, many CMAs are based upon the OCC model of

appraisal. However, this model does not have detailed process assumptions and, as such, modelers often apply ad-hoc solutions to extend the model with, for example, full componentiality, or time dependence. The more numerous such modifications, the further individual models diverge from the basic OCC template and from each other, which complicates formal comparisons. Although recent studies have attempted to address this issue by formalizing the language of CMAs (e.g., Broekens, Degroot & Kosters, 2008; Reisenzein et al., 2013), the classic computational approach ultimately remains disadvantaged by its dependence on many untested assumptions and the limitations of manual programming.

The classic statistical approach to empirical modeling offers the flexibility to estimate relations between emotion components from empirical data, and has been applied directly to test hypotheses of appraisal theories. Unfortunately, the majority of research in affective science has used a univariate, linear, and time-independent approach to collecting and analyzing emotion data (Lewis, 2005). The theories and hypotheses that I have discussed in the preceding sections suggest instead that emotion data require a multivariate, nonlinear, and time-dependent model for analysis. However, no empirical study thus far has addressed the three computational challenges that I discussed simultaneously.

The methodological goal of this thesis was to develop a solution to the computational challenges of appraisal theory (Table 1.1). For this purpose, I developed a hybrid computational approach to empirical modeling. This approach combines theory-driven modeling, by operationalizing structural or content assumptions of appraisal theories, and data-driven modeling, by using methods of statistical machine learning. Machine learning is a branch of statistics concerned with extracting complex patterns from large datasets, and is also sometimes referred to as the field of “pattern recognition” or

“data mining” (e.g. Bishop, 2006; Hastie, Tibshirani, & Friedman, 2009; Ripley, 1996). Models of machine learning have been developed primarily with the aim of detecting nonlinear data patterns (e.g., generalized additive models, nearest neighbors methods) but also to solve problems specific to certain data (e.g., high-dimensional measures, time series, spatial data). This makes machine learning an attractive tool for the modeling of emotion processes.

It is a standard in psychological science to test research hypotheses by null hypothesis testing, usually with a linear model (e.g., t-test, correlation test, multiple regression). Machine learning largely eschews p-values and instead focuses on measures of generalization to establish the goodness-of-fit for a given model. This is achieved typically by (a) splitting the available data into a so-called training set and a validation set, (b) fitting a model to the training set (i.e., estimating model parameters), and (c) testing its predictive accuracy on the validation set (Hastie et al., 2009). This procedure is motivated by the finding that nonlinear models often are capable of achieving perfect fit (i.e., zero predictive error) for training data.⁴ However, perfect fit usually indicates that the model confuses

4 For example, a model that contains as many (or more) parameters as there are data points in the dataset.

random noise for patterns. Therefore, it is important to test a model on data it has not “seen” during training, to assess its generalizability.⁵

Machine learning is often considered a “black box” method for statistical learning in that models typically (but not always) forego interpretability in favour of predictive power (e.g., artificial neural networks, support vector machines). However, this need not be the case and models have also been developed that emphasize interpretability (e.g., decision trees, multivariate adaptive regression splines). In this thesis, I favoured the latter type of model but also employed some black box methods.

1.4.2 Outline of thesis research

Four studies were conducted for this thesis. Table 1.3 gives an overview of descriptive details for these studies, including the challenges modeled, the hypotheses addressed, which emotion components were studied, and sample sizes. Modeling was conducted on very large datasets, three of which were obtained from earlier studies (Studies 1, 2, and 3b), one which was generated artificially (Study 3a), and one which was collected in a new experiment (Study 4). Table 1.3 shows that modeling involved an increasing level of complexity as studies progressed. Study 1 addressed only one computational challenge and one hypothesis in two emotion components. Study 4 addressed all three computational challenges (C1–C3) and investigated all five hypothesis (H1–H5) in all emotion components simultaneously. All studies used models of machine learning for the main analyses, supplemented with traditional inferential analyses (e.g., factorial ANOVA), data reduction methods (e.g., clustering, principal component analysis), and resampling methods (e.g., bootstrapping). Model fit was evaluated in the traditional manner by training and validation. A short description of each of these studies now follows.

Study 1 investigated interaction effects (H1) between appraisal criteria in a large dataset of recalled emotion experiences obtained from a survey study (Scherer & Meuleman, 2012). A categorical feeling response (12 levels; e.g., irritation, anxiety, pride) was modeled as a function of 25 appraisal criteria. The analysis consisted of two stages, which were (a) black-box modeling to discover the best fitting nonlinear model, without regard to its interpretability, and (b) appraisal feature selection, in which I attempted to identify the most predictive appraisal criteria and their interactions for differentiating the 12 feeling categories.

Study 2 extended the methodology of Study 1 by investigating interaction effects (H1) and curvilinear associations (H2) of appraisal criteria on all other components of emotion. Data were obtained from the GRID study by Fontaine et al. (2003), which surveyed respondents in 27 countries on componential knowledge of 24 emotion categories (e.g., joy, guilt, shame). Data analysis combined

5 In psychological research this procedure is common in factor analysis, where an exploratory factor analysis is often followed by a confirmatory factor analysis.

principal component analysis (PCA) for data reduction, and multivariate adaptive regression splines (MARS) for the automatic identification of interactions and the linear approximation of curvilinear relations (by splines).

Table 1.3. Descriptive details of empirical studies.

Objectives Emotion components modeled Sample sizes

Study Challenges Hypotheses Appraisal Motivation Physiology Expression Feeling Training Validation Total

1 C2 H1   1,200 4,701 5,901

2 C1–C2 H1–H2      816 816 816⁶

3a C1–C3 H1–H4     6,000 6,000 12,000

3b C1–C3 H1–H4     41,856 41,856 83,712

4 C1–C3 H1–H5      192,348 187,209 379,557

Study 3 extended the methodologies of Studies 1 and 2 to time-varying emotion data. For this purpose, I proposed a new computational model for simulating and analyzing emotion episodes called Emergent Liquid State Affect (ELSA). ELSA operationalizes Scherer's appraisal theory (CPM; 1984;

2001, 2009b) using a combination of wavelet transformation, liquid state machines (e.g., Jaeger, 2001;

Maass, Nattschläger & Markram, 2002) and penalized regression models (e.g., Hastie et al., 2009).

The chosen architecture enables ELSA to integrate time-series data of all emotion components while allowing dynamic feedback, delay effects, and nonlinear processes (H1–H3). Emotional states are emergent upon the time-varying synchronization (H4) of the overall system, which I proposed to measure with a new statistic called epsilon. This study focused primarily on describing the psychological and statistical background of the ELSA model. In addition, I applied the model to synthetically generated data (Study 3a) and to real data obtained from a study that measured electro-encephalogram (EEG) and electro-myogram (EMG) activity in response to manipulated appraisals of goal compatibility, power, and control (Study 3b; Gentsch, Grandjean, & Scherer, 2014).

Study 4 applied the ELSA model to new experimental data. For this purpose, a videogame paradigm was developed to elicit emotion and to collect time series data of all five components of emotion simultaneously. In the game, players navigated a maze to win money and deal with enemy

6 Cross-validation in study 2 was conducted by a 20-fold internal cross-validation on the data, hence the training, validation, and total samples were equal.

attacks. Enemy attacks obstructed the goal to win money and were manipulated orthogonally according to the appraisal criteria of coping (power to deal with an attack or not) and norm violation (the game is fair or cheats against the player). Physiological and expression data were measured during the game using heart rate, skin conductance, skin temperature, and facial EMG (i.e., direct objective methods). Motivation was measured during the game by recording the key press speed (faster presses indicated more motivation to advance; i.e., indirect objective method). Feeling was measured via retrospective self-report (i.e., indirect subjective method) during a replay of the game recording.

Supplementary appraisal and motivational self-report were collected in an identical fashion. Forty subjects participated in the videogame study. Data analyses focused on the hypothesis of synchronization (H4) and on feeling integration (H5).

The next chapters present these four studies and their results in detail. Following this, I integrate the results of all studies in the general discussion.

Dans le document Computational modeling of appraisal theory of emotion (Page 28-32)