On the Symbolic Functions of Vocal Affect Expression

(1)

Article

Reference

On the Symbolic Functions of Vocal Affect Expression

SCHERER, Klaus R.

Abstract

Starting from recent evidence on symbolic representational functions of primate calls, the question is raised as to whether human affect vocalisation might also serve cognitive representational as well as affective functions. An early model of sign functions, the Bühler Organon model, which postulates symbol, symptom and signal functions for all signs, is described. Furthermore, a distinction is made between push factors, which determine affect expression mainly via physiological effects, and pull factors, which influence expression via socially mediated models. Pull factors often seem implied in the signalling functions. While physiologically based push factors often provide symptoms of sender state, it is argued here that they may also serve cognitive representational functions in the sense that they reflect the antecedent cognitive appraisal processes that produced the affect state in the sender. In conclusion, a speculative link between the symbolic function of affect expression and the origin of language is discussed.

SCHERER, Klaus R. On the Symbolic Functions of Vocal Affect Expression. Journal of Language and Social Psychology , 1988, vol. 7, no. 2, p. 79-100

DOI : 10.1177/0261927X8800700201

Available at:

http://archive-ouverte.unige.ch/unige:101973

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

VOCAL AFFECT EXPRESSION’

Klaus R. Scherer

Department of Psychology, University of Geneva, 24, rue Gheral-Dufour, CH-1211 Geneva 4, Switzerland

Abstract Starting from recent evidence on symbolic representational functions of primate calls, the question is raised as to whether human affect vocalisation might also serve cognitive representational as well as affective functions. A n early model of sign functions, the Biihler Organon model, which postulates symbol, symptom and signal functions for all signs, is described. Furthermore, a distinction is made between push factors, which determine affect expression mainly via physiological effects, and pull factors, which influence expression via socially mediated models. Pull factors often seem implied in the signalling functions.

While physiologically based push factors often provide symptoms of sender state, it is argued here that they may also serve cognitive representational functions in the sense that they reflect the antecedent cognitive appraisal processes that produced the affect state in the sender. In conclusion, a speculative link between the symbolic function of affect expression and the origin of language is discussed.

Reconsidering the Functions of Animal Vocalisations

It has been customary to consider most vocal expression in animals as indicative of underlying affective o r motivational states (Darwin, 1872; Jiirgens, 1979;

Leyhausen, 1967; Marler & Tenaza, 1977; Morton, 1977; Tembrock, 1975). How- ever, recent research (Cheyney, 1984; Gouzoules, Gouzoules & Marler, 1984;

Marler, 1984; Seyfarth & Cheyney, 1982) indicates that the situation is more complex. In their studies of the alarm calls of vervet monkeys, they found that calls are not only indicative of the emitter’s fear state, but are also specific to certain types of predators. Alarm calls produced for leopards, eagles or snakes, for example, have different sounds, energy levels and frequency ranges (see Figure 1). There- fore, Marler, Cheyney and Seyfarth reject the notion that animal communication is limited to indicating the animal’s emotional or motivational state, arguing that most animal calls have a very strong referential symbolic component.

Supporting this notion is the observation that alarm calls seem to be learned.

Evidence shows that infant vervet monkeys generalise calls (for example, applying the martial eagle call t o all birds) and progressively learn to distinguish between predators (see Figure 2). This process is similar to the one in which chil- dren learning a language overgeneralise a word before applying it to a particular category. Therefore, the alarm call system does not simply ‘push out7 the affect

Q 1988 K. R. Scherer 0261-927X/88/02 0079-22 $02.50/0

JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY Vol. 7, No. 2,1988 79

(3)

80 LANGUAGE AND SOCIAL PSYCHOLOGY

Figure 1 Sonograms of acoustically different vocalisations uttered in four broad contexts (from Cheney, 1984).

INTRA-GROUP THREAT

CHUlTER GRUNT PANT-THREAT

INTER-GROUP THREAT

2

CHUlTER GRUNT AARR

I 1..

',

*, ^I

;F'ICa I

+

^\^I

^ti'\

⁴ ⁱ^\

:y

¹

**a * * w y r**

THREAT/ALARM TO PRIMATES

JJpl

t*.~111111. ! ^I

d.,! ¹

THREAT TO UNFAMILIAR BABOON

H U M A N OBSERVER H U M A N

OTHER SPECIES

SNAKE LEOPARD EAGLE

underneath but reflects the outcome of predator classification, which, using the term ill a wide sense, could reflect underlying 'cognitive' processes.

An exclusive emphasis on this symbolic function, however, neglects the fact that most signals are multifunctional. An early model developed by Buhler (1934) concerning the functions of speech, the Organon model (see Figure 3), can be used to annlyse the functions: as a sytnbol in representing the object, event or fact it stands for; as asytiipfotii of the state of the sign user; and as an appeal, orsignal, trying to elicit a response from the receiver. The vervet monkey alarm call, for example, serves all three: it functions as a symbol of different predators, it functions as a symptom of the fear state of the animal and it functions as an appeal to others to run away. Furthermore, these functions are mutually interdependent. If

(4)

Figure 2 Hierarchical representation of the number of eagle alarm calls given by adults, juveniles and infants to different avian species. Broken line, 1-5 alarms; single line, 6-10 alarms; double line, 11-15 alarms; heavy line, 15 alarms (from Scherer, 1984).

EAGLE ALARMS

MAMMALS

ADULTS

^(xzq-

^/ ^O ^S ¹

-

^J^-¹

/

I \\----

Snake Martial Hawk Owl Tawny Eagle Eagle Eagle

Stork Vulture

- -

EAGLE ALARMS

JUVENILES

1

, -

Spoonbill SecretaW

hawk leur Eagle Eagle Eagle Bird

- -

-

- M A M ~ A L S Baboon

MAMMALS

INFANTS

-

r

- -- - .

^,^,

=<::= - -

-

_ - - -- .

^, ^,

- +pg -

- < . - - - -

^*^,

.,-

^:^:

-

^~

Cos--&te--Sn& Hawk bw<l%wny Stor( Vul- -Heron Goose Pigeon Roller-Horn-- Leaf

hawk leur Eagle Eagle Eagle ture bill

Figure 3 Biihler’s Organon model.

I Objects

and

Facts I

I

REPRESENTATION

Symbol

a call refers to an air predator, both the emotional reaction and the appeal may be very different from one made in reference to a ground predator: in the first case, both emitter and receiver might seek shelter under a bush and freeze; in the second, they might become highly activated and run up a tree.

(5)

82 LANGUAGE A N D SOCIAL PSYCHOLOGY

Push and Pull Effects in Human Affect Communication

How can we use this model to understand the potential symbolic/referential and signahppeal functions in affective vocalisations? How can it be used to study the symptom aspects of human affect communication? These questions will be addressed in this paper.

To discuss the potential symbolic bases of human affect vocalisations, let us consider the following example. Two people are making their first attempt t o eat oysters at home, and upon opening the shell, observe a black worm slither from the oyster. One of them immediately screeches, ‘Eee!’ This affect vocalisation o r primary interjection (see Scherer, 1977; also Goffman, 1979) has a referential function because it refers t o an unappealing matter, an expression function in communicating the disgust of the sign user, and an appeal function as it warns receivers to closely scrutinise what is on their own plates in order to avoid acciden- tally swallowing unpalatable matter. Clearly, affect vocalisations are the closest equivalents to animal calls and also serve the same three functions outlined by Buhler.

To continue with our example, let us take the case of a person eating in a restaurant who happens to observe another diner apparently relishing a dish of oysters containing the black worms. In this situation, some people might exclaim,

‘Yuck‘. This exclamation is no longer a primary interjection, but what Wundt (1903) called a ‘secondary interjection’, i.e. affect vocalisations that have been

‘domesticated’ by language (which explains the different transcriptions of interjections in different languages). The functions of this type of expression (which we might call a ‘vocal emblem’; see Scherer, 1977) differ from those of the primary interjection in that they are subject to cultural control: in this particular example, the representation function refers to a culturally defined class of objects which are revolting (as opposed to an agreeable stimulus in the environment), the expression function communicates a feeling of disgust held in check, and the appeal function signals to others that the speaker wishes to avoid the object in question. Sec- ondary interjections (which can of course be positive in nature, e.g. the ‘Aah’ of satisfaction) also differ from primary interjections because they are vehicles of self-preserztatioi~ (see Goffman, 1969; Schlenker, 1980) - the speaker uses the sign consciously to give information about him- or herself. There is room for deception, however: a speaker may feign a primary interjection in order to appear more authentically emotional.

The major difference between the ‘Eee’ and the ‘Yuck’can best be explained by the differentiation between ‘push effects’, in which physiological processes such as muscle tone push vocalisations into a certain direction, and ‘pull effects’, where external factors such as the expectations of the listener may pull the affect vocalisation into a different direction (Scherer, Helfrich & Scherer, 1980). In the push effect, given that muscle tone is likely to be higher in sympathetic arousal, the fundamental frequency of the voice

(n>)

will also be higher. Pull effects, on the other hand, are governed by social conventions such as display rules. These cultural conventions influence the production of signs required in social situations by specifying a particular acoustic target pattern, as opposed to mental concepts o r

(6)

internal physiological processes which push out expression (for a detailed discussion of push and pull effects, see Scherer, 1985). This distinction is important in understanding the differences between vocal productions. As far as primary and secondary interjections are concerned, the difference between the pure affect vocalisation ‘Eee!’ and the domesticated ‘Yuck!’ is that ‘Eee!’ is mostly push, whereas

‘Yuck!’ is at least partly if not mostly pull. Thus, push factors are defined as producing changes in subsystem states in the organism which have a direct effect on vocalisation parameters. They work largely involuntarily; the effects on vocal organs and the resulting acoustic parameters are almost exclusively determined by the nature and force of physiological mechanisms. Pull factors, on the other hand, although they are mediated through internal systems, are externally based- they operate towards the production of specific acoustic patterns or models, as in the case of detailed optimum signal transmission features or socially defined signal values (see detailed discussion below). For push factors, the physiological changes affecting phonation and articulation are the independent variables; the dependent variable consists of the acoustic patterns which are produced as a result. For pull factors, the acoustic pattern resulting from the vocalisation is the independent variable, while the phonatory and articulatory movements used to achieve this are the dependent variables.

A good analogy for push and pull effects is that of a man standing on a hillside trying to move a stone. If he pushes it, the stone will roll down, but the pusher cannot control exactly where it goes; if, however the stone is at the bottom of the hill and the man pulls it up, the resting point is clearly defined, and he can, to a certain extent, determine the way the stone comes up. Similarly, if one is very aroused, one cannot control the details of one’s vocal production mechanisms, whereas if one desires to mimic the ecstatic howls of an enthusiastic juvenile, one can refer to an external model which is independent of the production mechanism used.

Obviously, while the model o r target pattern used is imported from the external world, the motivation to produce the respective pattern is internally controlled.

Let us outline some of the major types of pull effects:

One effect important in both animal and human communication is linked to sound transmission characteristics (see Brown, 1982). Certain types of voices are more easily localised because they carry well; in the forest these voices are a danger because they localise the emitter to predators, but they are an advantage when it comes to finding appropriate sexual partners.

Self-presentation’is a pull factor in which vocalisations are pulled in different directions to give a positive appearance o r express intentions. An interesting aspect of this issue is the notion of deception. While some sociobiologists have suggested that most of signal use is deceptive, it seems reasonable to assume that occasional deception can only work if signals in general truthfully communicate the underlying state o r intention of the sender (see Scherer, 1985: 212-15 for a detailed discussion of this point).

Attracting o r repelling conspecifics through conventional social signals is another important pull effect. Acoustic patterns such as the vocal emblems men- tioned earlier seem to be culturally determined: in Germany one says ‘Igittigitt’;

in America, the expression is ‘Yuck’.

(7)

Finally, an important pull factor is due t o interpersonal convergence or accommodation. If one is trying t o accommodate, the speech patterns of the other person become the model that pulls production mechanisms into a given direction (see Feldstein & Welkowitz, 1978; Giles &Smith, 1979; Giles er al., 1986 for com- prehensive overviews). While the motivational underpinnings are quite different in this case, the mechanism is the same as in other types of pull effects: theproduc- tion of the ‘pulled’ acoustic features is determined by the model provided by the interaction partner towards whom one is converging or from whom one is diverg- ing.

If the distinction between push and pull effects is important for differentiating between primary and secondary interjections, it is also important in explaining the configurations between verbal and non-verbal communication. To continue with the above example, let us consider the situation in which one has been invited to dinner at the Dean’s, where, in an effort to provide a special meal, oysters are served. Again, to the dismay of the guests, the worms make their appearance.

Those with strict traditional upbringings are not likely to exclaim ‘Eee!’ nor, worse, ‘Yuck!’. More probable are comments like ‘How interesting’, if irony is intended o r ‘Oh my! What a shame’, if the speaker is trying to be helpful (in fact, the precise utterance will depend on the relationship t o the host and the desired effect of the speech act). The verbal expression would be accompanied by a particular facial expression, voice quality and intonation. In terms of the reference function, the expression, ‘How interesting’ is a noncommital evaluation of a com- munally served meal in a social context that prescribes gratitude. In terms of the function of expression, the statement communicates, ‘I feel revolted but I’m a polite person’, and in terms of the appeal function, the communication is, ‘Please feel that I acknowledge your efforts but don’t ask me whether I like the results’.

An interesting question at this point concerns the degree t o which non-verbal messages complement or contradict the verbal signal. Does the configuration of the verbal and the non-verbal produce an integral message? To this important question we have no definite answer in spite of many research efforts (see Bugen- tal, Kaswan & Love, 1970; Hess, Kappas & Scherer, 1988; Mehrabian & Ferris, 1967). We can, however, show the extent t o which the combination of two signal systems, the digital linguistic system and the analogic non-verbal affect expression system, allows a large set of possibilities t o communicate messages under many social constraints (see Giles, Scherer & Taylor, 1979; Scherer, 1985).

How can we link the pushlpull concept to the Buhler model? One prediction is that the symptom aspect, i.e. the expression of an internal state, would be push, whereas the symbol and appeal aspects might be pull, with the two obeying different laws. Different factors might determine the nature of the expression in each case. And, it might be the antagonism between push and pull, e.g. high physiological arousal pushing voice fundamental frequency up and the conscious attempt to show ‘coolness’ pulling it down, which produces mixed messages. If this were so, it would be important to isolate the two determinants empirically. Another major aim for future research would be to distinguish more clearly the three types of functions. This would help avoid erroneous extremes, such as, ‘all animal communication is expression of affect state’, o r ‘all animal communication is cognitive

(8)

referencing’. Both are clearly wrong. There is always a mixture, but researchers rarely distinguish between the two aspects.

Empirical Evidence on the Relationships Between Acoustic Cues and Attributions of Speaker State

How can we differentiate between push and pull effects and how can we analyse the different functions of vocal utterances? These issues will be the topics for the remainder of this paper. One of the potential issues for research is the kind of signal feature covariation we would expect to find under push and pull conditions. If, in a push condition, muscle tension goes u p under stress, producing an increase in the fundamental frequency of the voice, we would expect direct covnriafion between the amount of muscle tension increase measured by electromyography and the increase in fundamental frequency as measured by digital voice analysis.

In this covariation model, we would expect a continuous (and probably linear) covariation between the two variable classes. The alternative is what we call the configuration model (see Scherer, Ladd & Silverman, 1984). The configuration model is more ‘linguistic’ than the covariation model, itself more psychological in nature. The configuration model argues that to achieve a certain effect in the listener, one uses a particular combination of intonation, accent, word and/or syntactic structure, e.g. a rising intonation contour in a Wh question, a falling one in a Yeslno question. There are no variable dimensions, no continua, in configuration effects: certain classes of phenomena have to co-occur to produce an effect.

In terms of push and pull, push is likely t o follow covariance rules, while pull, if anything, would follow configuration ruIes. In trying to understand how communication processes follow either a covariation or a configuration model, we need to gather insight into the determinants.

Scherer, Ladd and Silverman conducted two studies to distinguish covariation and configuration. The first (1984) used a corpus of 66 questions from a large-scale study on interactions between civil servants and other citizens from different regions in Germany. Out of this data set of realistic interactions we culled 66 Wh questions that were homogeneous in structure, but that varied in terms of pragmatic force. Some questions were clearly reproaches, though phrased as Wh questions, while others were factual information questions. Therefore, there was a great deal of variability in the pragmatic content in the 66 questions. We next used filtering o r degrading techniques to isolate particular acoustic cues systematically (see Scherer, 1982; Scherer, Feldstein, Bond & Rosenthal, 1985 for a detailed overview). Three such techniques were used: (1) low pass filtering, (2) random splicing, and (3) reversing. Low pass filtering consists of electronically removing the frequencies above about 500 H z in the spectrum. This produces a sound which resembles mumbling o r speech heard through a wall

-

unintelIigibIe in terms of speech content but allowing the listener to determine the timing and rhythm of the speech and yielding a gross impression of the intonation and intensity contours of the utterances. Random splicing consists of dividing the speech signal into segments of about 300 ms (about three phonetic segments) with adjacent segments

(9)

Table 1 The effect of the speech-masking methods on major paralinguistic cues (from Scherer et al., 1985). Vocal cues

z 2 > > z

Fundamental Type of frequency Voice Q masking Loudness (pitch) Intonation quality Tempo Pauses Rhythm M Backwards Unchanged Unchanged Distorted Distorted Unchanged Unchanged Distorted

U 8

splicing masked masked masked cl

F r 2 2 ?

Random Unchanged Unchanged Partially Unchanged Partially Fully Distorted Electronic Partially Unchanged Unchanged Fully Unchanged Unchanged Unchanged cd filtering masked masked distorted Pitch Partially Unchanged Partially Fully Partially Unchanged Partially inversion masked masked masked masked masked Tone- Fully Fully Fully Fully Partially Unchanged Partially silence masked masked masked masked masked masked 0 0 4

(10)

overlapping by 3 ms. These speech segments were then randomly recombined with some smoothing between segment boundaries. Obviously, this procedure removes all temporal cues such as pauses, rhythm and intonation contours. How- ever, the frequency range remains fully audible and thus provides a content-free sample of the speaker’s voice quality. Reversing simply consists of playing the audio tape in reverse direction. In principle, this procedure leaves all acoustic cues in place except for the sequence. The impression created by this technique is quite unusual, however, because the air sounds as if it were being sucked into the vocal tract rather than radiated from the mouth. Furthermore, a new intonation contour is artificially created, since listeners are not usually able to transpose the contour even if they know they are listening to backwards speech. Obviously, each of these techniques masks different acoustic variables and isolates or highlights others. Table 1 provides a comparative list of the acoustic cues retained by the respective techniques.

The voice samples produced by using these masking techniques in addition to the full audio unaltered speech sample were used in a series of judgement studies.

These were to determine the extent to which judges exposed to the masked conditions would agree in their impressions of the speaker’s attitude and affect with those judges who had access to the full audio version. In order to estimate the degree to which the verbal content of the question would influence the ratings of the full audiosamples, we also had the transcriptsrated, i.e. judges read theverbal content of the questions without listening to any auditory input. We constructed a rating instrument consisting of the following scales: ‘polite, aggressive, unsure, insistent, aroused, doubtful, relaxed, impatient, friendly, understanding, reproachful’. Out of these 11 adjectives describing speaker state, only one yielded a significant correlation between the full audio and the transcript conditions, namely, ‘aggressive’ ( r = 0.43, p<O.OOl). Thus, the verbal content contained a sufficient number of cues to arrive at a judgement of aggressiveness which corres- ponded to that of the judges exposed to the full audio versions, while non-verbal cues were necessary for the corresponding judgements of the other dimensions used in this study (polite, unsure, doubtful, relaxed, impatient, friendly, understanding, reproachful). Therefore, the correlations between the ratings in the masked conditions and the full audio conditions should yield some information concerning the type of cues which mediate the communication of speaker state.

Table 2 shows the pattern of correlations.

The results show that even when the text of the questions used is rendered unin- telligible, much of the affective meaning remains in the acoustic signal. This con- firms the covariance model in its claim that non-verbal vocal cues convey affect in a direct and context-independent way. However, the pattern of correlation coefficients in Table 2 shows that this is true only for those masking conditions in which voice quality cues are audible, i.e. in the random splicing and reversing conditions. In both of these conditions, the intonation contour of the sentence is lost or destroyed. This would seem to imply that this feature of speech utterances plays no major role in the communication of affective meaning. Obviously, this is rather counter-intuitive and contradicts empirical evidence showing that this information is relevant for the communication of affect. Could it be that intonation fol-

(11)

Table 2 Correlations between ratings in the different masking conditions (from Scherer etal., 1984).

Fit11 audio Random spliced Filtered

Reversed Polite O . a a -

-

Insecure 0.41*

-

^Insecure^0.4gb

Challenging 0.66b - -

Agreeable ^0.61b - -

Aroused 0.42” Aroused 0.42“ -

Random spliced Polite 0.43“

Agreeable 0.82‘

Aroused 0.56b

Filtered Polite O.4Sb

Notes: r is based on N = 24 utterances in each condition ap<0.05

bpCO.01

‘p<O.001

lows configuration rather than covariance rules? In order to investigate this question with the speech material in this study, i.e. the 66 questions, we divided the questions into Wh and Yeslno questions and classified the intonation contours into final fall and final rise (see Scherer, Ladd & Silverman, 1984 for details).

Figure 4 shows the differences in affect ratings for the four categories thus created.

T h e results show that intonation contour obviously has a strong effect on the impression of speaker affect but it seems to be mediated by context effects, verbal o r syntactic.

O n e might interpret these results as reflecting the traditional descriptions of

‘normal’ o r ‘unmarked’ intonation for the two different question types. The sup- posedly ‘normal’ combinations of intonation type (i.e. falling Wh questions and rising Yes/no questions) were judged as more polite and agreeable. ‘Marked’

combinations on the other hand were rated rather more negatively. This clearly points to strong configuration effects. It is possible then, to presume that some acoustic cues, such as voice quality, operate according to covariance rules, whereas others, such as intonation contours, are used in accordance with configuration rules. This would make sense in terms of a psychobiological approach to communication. One could argue that those cues that show a remarkable degree of phylogenetic continuity - such as the differential nature of phonation which yields different voice qualities

-

are closer to direct covariance with physiological states. In contrast, cues that have been domesticated within a language system, such as intonation, should follow a configuration model.

In order to test these notions further, we used digital resynthesis of speech (see Markel & Gray, 1976) to enable us to vary different acoustic cues experimentally in a factorial design. Such an approach obviously avoids the disadvantage of using a natural corpus such as the 66 questions we used in the earlier study, since it allows greater experimental control of the variables under study. The digital

(12)

Figure 4 Ratings of affect in utterances with final falls versus those with final rises, for the three scales where intonation interacted with question type. (a) Challenging, (b) agreeable, (c) polite (from Scherer et d., 1984).

W Wh questions 0 Yeslno questions

Wh questions (b) o Yes/no questions

Final fall Final rise Final fall Final rise

Wh questions

(C) 0 Yedno questions

1 .o

1 i

-1.0 O h

Final fall Final rise

resynthesis technique uses a powerful signal processing algorithm, linear predictive coding (LPC), which reduces the complexity of the anolog speech signal to a sequence of reflection coefficients - a set of 12-30 for short segments of speech.

These coefficients are then used to resynthesise the original speech wave in the computer. Figure 5 shows a demonstration of this method.

A number of manipulations can be performed on these coefficients in order to systematically vary fundamental frequency, intensity, o r even voice quality.

Figure 6 shows an example of the manipulation contour in a German sentence (‘Aber scliriftlicli habe icli das nicht bekonmten’) producing downtrend and uptrend contours (see Ladd, 1983).

In a series of studies (Bergmann, Goldbeck & Scherer, 1988; Ladd, Silverman, Tolkmitt, Bergmann & Scherer, 1985), we used this technique to systematically vary intonation contour, I;D range, intensity, timing, accent, structure, and other parameters. The advantage of this procedure, as pointed out above, is that all of these acoustic features under study can be manipulated independently of each other in a factorial design while leaving all of the remaining acoustic cues constant.

Table 3 shows an example of the kind of experimental design that was used.

(13)

90 LANGUAGE AND SOCIAL PSYCHOLOGY

Figure5 Demonstration of the digital resynthesis technique (Linear Predictive Coding).

1.00 1.03 1.03 1.03 1.01, 1.03 1.03 1.01, 1.03 1.01, 1.00 1.03 0.26 1.17 1.09 1.06 1.06 0.76 0.88 1.06 0.78 0.56 0.82 0.12 0.22 0.01 0.10 1.25 0.13 0.47 0.37 0.25 0.07 o.& 0.01 0.18 0.16 0.29 0.26 0.09 0.06 0.19 0.39 0.14 0.79 0.49 0.69 0.63 0.04 0.45 0.28 0.46 0.17 0.15 0.45 0.02 0.49 0.78 0.83 0.06 0.01 0.03 0.57 0.05 0.11 0.09 0.24 0.10 0.40 0.28 0.03 0.01 0.02 0.42 0.09 0.25 0.14 0.59 0.52 0.41 0.11 0.11 0.14 0.04 0.10 0.07 0.25 0.13 0.18 0.11 0.09 0.10 0.01 0.01 0.22 0.21 0.13 0.25 0.59 0.48 0.24 0.14 0.16 0.41 0.22 0.13 0.09 0.22 0.08 0.20 0.09 0.03 0.24 0.35 0.55 0.11 0.06 0.10 0.07 0.05 0.11 0.23 0.27 0.03 0.43 0.11 0.24 0.07 0.22 0.15 0.35 0.18 0.06 0.20 0.15 0.12 0.18 0.14 0.03 0.13 0.15 0.03 0.21 0.07 0.01, 0.17 0.16 0.02 0.02 0.09 0.10 0.16 0.05 0.19 0.16 0.14

M E ( 2 ) =

S(z)A(z)

⁼S(z) ⁿⁱ^2-j

i - 0

(14)

175.

150.

100

75-

Figure 6 Examples of the two contour types used in the experiments. The top half shows FO tracings for the downtrend (solid line) and uptrend (dashed line) contours, with the anchor points circled. The lower half shows the signal energy, together with a phonetic transcription of the utterance ‘Aber schriftlich habe ich das nicht bekommen’ (from Ladd

et al., 1985).

FO Hr

125:4

+

3 ^.^.^{. .}

^{. .}

^{. .}i^.’

^.

^. ^*.

^.,- ^.

^..

1 ‘ i ,

0 0.5

to

15 s

a b a B J B r ft1xChabxCda

s n I

C t b a k

3

rp

Table 3 Factorial design (2 X 2 X 2 ^X3) for a study of speaker state attribution using digital voice resynthesis.

Voicequality Rough

Zntonation contour riptrend d o wntrend

Normal Intonation contour uptrend downtrend

Item A Item A

Narrow ItemB Item B

Item C Item C

Item A Item A

Wide Item B Item B

Item C Item C

FO-range

Item A Item A

Item B Item B

Item C Item C

Item A Item A

Item B Item B

Item C Item C

(15)

92 LANGUAGE AND SOCIAL PSYCHOLOGY

M E A N 5- RAT1 N G

4 ,

3 -

2

1

Figure 7 Ratings as a function of the five range levels used in experiment 111 (from Ladd et al., 1985).

reproachful emphatic contradicting

: 4

cooperative

--

4

MEAN 5 RAT1 N G

involved arrogant aroused annoyed

1

1 2 3 4 5

LEVEL OF RANGE

Space militates against reporting all of the data from these studies, but three major types of findings should be highlighted. First, we did not find any interaction effects in the analysis of variance, suggesting that the acoustic variables we studied function largely independently of each other. Secondly, in those studies where we used several speakers and several utterances we found virtually no interaction between these factors and the acoustic variables manipulated. This encourages one to think that the effects can be generalised over a wide range of speakers and utterances. Thirdly, out of the variables studied, FOrange had the most powerful effect by far on the judgement of the raters, particularly on the

(16)

attributions of arousal. Furthermore, we were able to show that these effects seemed to be a continuous function of changes in range. Figure 7 shows that arousal-related ratings go up with increasing FO range.

Results for intonation contours and voice quality were complex and seem to require further study. In the case of intonation contours, this may well be due to the important role of the configuration model for this variable. In consequence, we feel that the distinction between configuration and covariance rules may be very useful in understanding the communication of affect in vocal utterances, and it would seem useful to continue this type of research with the aid of modern digital signal manipulation techniques.

Affect Vocalisations as Signals and Symbols

Most of the studies discussed so far have been directed only at the symptom furtctiorz of vocal utterances. We also need to explore the signal and symbol functions. The paradigm invoking covariation and configuration rules presented above provides an interesting beginning, but it is certainly not sufficient to deal with these two functions. Although the configuration model seems to be directly linked to the symbolic function and possibly also to the signal function of the utterance, with the covariation model being more applicable to the symptom functions, we need to be able to separate symbol and signal functions. Furthermore, at the moment there is a lack of theoretical predictions that would allow hypothesis guided testing.

The signalling aspect of vocal utterances is particularly underdeveloped. This would seem to be an important task for young and hungry social psychologists of language. For example, one might think of using a rating study approach to try to determine how listeners perceive the results that particular messages have on their own behaviour. In other words, rather than exclusively focusing on listener attributions of speaker state, we might try to develop some rating instruments that measure perceived consequences in the listeners, e.g. becoming frightened, focusing attention on a particular situational feature, laughing at the speaker, etc.

Obviously, this would require research designs that are somewhat closer to real life than the usual speaker judgement studies. However, this approach would seem to open u p a number of very interesting questions such as the role of con- sciousness or awareness of the signalling function of perceived utterances. So far, not much attention has been given to this important aspect of the speech communication process.

Similarly, speech act theorists, while focusing on the signalling aspect, have centred on guessing the intentions of the speaker (see Searle’s more recent work on intention, 1986). The most difficult question, however, concerns the symbolic function of affect vocalisations. Since this function has usually been considered to be the exclusive domain of linguistics, o r in particular, semantics, it might seem far-fetched to claim that emotional sound-making carries cognitive representational information, and in that sense, serves the third function in the Biihler model. And yet, let us return to the research on vervet monkey alarm calls with

(17)

94 LANGUAGE AND SOCIAL PSYCHOLOGY

which this paper started. Peter Marler and his collaborators claim a symbolic representational function for these calls, hitherto considered to be exclusively motivational in nature. Could we argue that human affect vocalisations, upon closer look, can also communicate symbolic information? Obviously, we do not have clearly classifiable vocalisation types for the different events that evoke fear in us. Thus, it is probably not via the identification of a particular category that such a symbolic function would work. How, then, can we envisage cognitive representational functions of affect vocalisation?

I will venture a rather speculative proposal on how one might conceive of the symbolic function of vocal affect signs. I will maintain that the acoustic characteristics of an emotional vocalisation reflect the complete pattern of the cognitive appraisal process that produced the emotional state in the sender. This information about the criteria used in the emotion-antecedent evalution should allow the listener to reconstruct the major features of the emotion-producing event in its effect on the speaker.

In order to explain this postulate I have to expose some recent theorising on emotion. Many theorists in the field of psychology of emotion seem convinced that most human emotions are preceded by cognitive evaluation of events and situations (although the type of cognitive process can be relatively low level, automatic and unconscious). There is a fair amount of agreement on the nature of criteria that are used in appraising events or situations that are important for organisms (see Scherer, 1984,1988; Frijda, 1986). As an example, Table 4 repro- duces the cognitive criteria I have suggested for emotion relevant event appraisal.

Table 5 lists a table of prediction on the likely evaluation patterns responsible for

Table 4 Cognitive criteria for appraising emotion-producing events (from Scherer, 1985).

1. Novelty check. Evaluating whether there is a change in the pattern of external or internal stimulation, particularly whether a novel event occurred or is to be expected.

2. Intriizsic pleasantness check. Evaluating whether a stimulus event is pleasant, inducing approach tendencies, or unpleasant, inducing avoidance tendencies; based on innate feature detectors or on learned associations.

3. Goalheed sigtiifcaizce check. Evaluating whether a stimulus event is relevant to impor- tant goals or needs of the organism (relevance subcheck), whether the outcome is consistent with or discrepant from the state expected for this point in thegoallplan sequence (expectation subcheck), whether it is conducive or obstructive to reaching the respective goals or satisfying the relevant needs (conduciveness check), and how urgently some kind of behavioural response is required (urgency subcheck).

4 . Coping potential check. Evaluating the causation of a stimulus event (causation subcheck) and the coping potential available to the organism, particularly the degree of control over the event or its consequences (control subcheck), the relative power of the organism to change or avoid the outcome through fight or flight (power subcheck), and the potential for adjustment to the final outcome via internal restructuring (adjustment subcheck).

5. Normlself conzpatibility check. Evaluating whether the event, particularly an action, conforms to social norms, cultural conventions, or expectations of significant others (external standards subcheck), and whether it is consistent with internalised norms or standards as part of the self-concept or ideal self (internal standards subcheck).

(18)

Table 5 Hypothetical outcomes of stimulus evaluation checks for selected emotional states (from Scherer, 1986). Novelry Pleasantness Goallneed significance Coping potential Norm compatibiliry

2

Emotional State Relevance Expectation Conduciveness Urgency Control Power Adjustmen t External Internal Enjoyment/ happiness Elatiodjoy Displeasure1 disgust Contempt/scorn Sadness‘ dejection Grief/ desperation Anxietylworry Feadterror Irritatiodcold anger Ragelhot anger Boredom! indifference Shamelguilt

LOW High Open Open LOW High LOW High LOW High Very low LOW

High High Vcrylow LOW LOW Low Open LOW Open Open Open Open

Medium High LOW LOW High High Medium High Medium High LOW High

Consistent High Discrepant High Discrepant LOW Discrepant LOW Discrepant Obstruct Discrepant Obstruct Discrepant Obstruct Discrepant Obstruct Discrepant Obstruct Discrepant Obstruct Consistent Obstruct Discrepant Obstruct

Very low LOW Medium LOW LOW High Medium Very high Medium High LOW Medium

- - Open Open None LOW Open Open High High Mcdium High

- - Open High

-

LOW LOW Verylow Medium High Mcdium Open

High Medium High High Medium LOW Medium Medium High High High Medium

High High High High LOW

-

LOW LOW - - Vcrylow Verylow Note: Dashes indicate that the criterion is not applicable; ‘Open’ indicates that several different evaluation outcomes for this criterion are compatible with the respective emotion.

(19)

Table 6 Voice type predictions for specific emotional states (from Scherer, 1986).

Hedonic

Emotional state valence Activation Power

Enjoymenthappiness Wide Relaxed Slightly full

Elatiodjoy Wide Medium-tense Medium-full

Displeasureldisgust Very narrow Slightly tense Slightly full

Contemptlscorn Narrow Slightly tense Medium-full

Sadnesddejection Narrow LaX Thin

Griefldesperation Narrow Tense Thin

Anxietylworry Narrow Slightly tense Thin

F e d t e r r o r Narrow Extremely tense Very thin Irritatiodcold anger Narrow Medium-tense Medium-full

Rage/hot anger Narrow Very tense Extremely full

Boredodindifference Narrow Relaxed Slightly full

Shamelguilt Narrow Slightly tense Thin

the elicitation of some of the major emotions. In other words, an attempt is made to identify the patterns of appraisal outcome that must be fulfilled if a particular emotion is to be produced. If the predictions in this table can be empirically sup- ported, we could be reasonably sure that there are stable rule-governed cognitive processes that precede emotional arousal of specific kinds.

If this is the case, then knowing an organism’s emotional state should allow us to infer the emotion-eliciting cognitive processes, and thus, the approximate nature of the emotion-eliciting event. It is possible, then, that the matrix shown in Table 5 is part of our cultural knowledge, part of our social representation of emo- tion and its eliciting situations. Given that the face validity of the predictions in Table 5 seems to be rather high, this may not be too wild a speculation.

If listeners are able to identify a particular emotional state of the sender from the acoustic features of the vocalisation, thus inferring the nature of the emotion- producing event, then one might claim a symbolic function for emotional vocalisations. There can be little doubt that humans are rather good at inferring the nature of the sender’s emotional state from his o r her vocal utterances. There are quite a large number of recognition studies showing that our ability to identify basic emotions from vocal samples is as good as, if not better than, our ability to recog- nise emotions from facial expression (see Scherer, 1986). One could go even further. We are not only able to identify emotional states on the basis of acoustic cues, we may even have direct access to the results of the cognitive appraisals that have produced a particular emotional state. Table 6 shows predictions on how we would expect the major phonation characteristics to vary as a result of the major emotion antecedent evaluation criteria (see Scherer, 1985,1986). On the basis of

(20)

Table 7 Changes predicted for selected acoustic parameters on the basis of the voice type predictions (from Scherer, 1986).

Parameters

ENJI ELAl DISPI CON1 SAD1 GRII ANXI FEARl IRRI RACE1 BORI SHAI Voice fype HAP JOY DISG SCO D,EI DES WOR TER COA HOA IND GUI A:

Perturbation <=

Mean C

Range <=

Variability C

Contour <

Shift regularity =

F1 mean <

FL mean

Elbandwidth >

Formant precision Intensity:

Mean <

Range <=

Variability <

Frequencyrange >

High-frequency energy <

Spectral noise Speech rate >

Transition time >

>

5 3 2 >

C

<

C >

>

3

>

<>

3

<

>

<

cc

>

> >

<> <> >

s >

< >

> > >

< < <

< c > <<

> s >

>> ss >

C

<

>> > >>

> <> >>

>

= s >

> <

a >

33 <> <>

53 c 22

33 < 22

>> c =

< ^C

> > >

< < <

cc << <<

> > >

> ³ ²²

> > >

> >

>> > >

3 2 >> >>

23 2

< C

=s

>

C

<

C >

>

<>

Note: A N X N O R , anxietylworry; BOIUIND, boredomhndifference; CONISCO, contemptkcom;

DISPIDISG, displeasureldisgust; ELMJOY, elationljoy; ENJ/HAP, enjoymenthappiness; FEAR/

TER, fearlterror; GRIIDES, grieudesperation; IRWCOA, imtationtcold anger; RAGEJHOA, rage1 hot anger; SADIDEJ, sadnesddejection; SHAIGUI, shame/guilt.

m,

fundamentaf frequency; Fl, first formant; F2, second formant. >, increase; <, decrease. Double symbols indicate increased predicted strength of the change. Two symbols pointing in opposite directions refer to cases in which antecedent voice types exert opposing influences.

this table a very detailed table of predictions on which individual acoustic characteristics should be related to particular emotional states can be produced (Table 7).

If this line of reasoning is correct, one might conclude that by appropriate infer- ences from particular acoustic cues, receivers should be able to judge not only the nature of the emotional state of the speaker but also, and maybe even more directly, the outcomes of the pattern of cognitive appraisals which have produced the respective emotional states. In consequence, listeners should also be able to infer the approximate nature of the emotion producing event or situation as well as information about the speaker’s ego involvement and coping potential. If this were the case, and if this effect were to be powerful enough to transcend individual idiosyncracies and the influence of contextual clues, then one would be jus- tified in claiming a symbolic representational function for non-verbal vocal affect expression.

(21)

98 LANGUAGE AND SOCIAL PSYCHOLOGY

This leads to a rather speculative conclusion. Given the obvious phylogenetic continuity in non-verbal affect communication, one could attempt to use a recon- struction of the sequence of development in the evolution of emotional communication systems as a pointer to the development of language. If vocal emotion signalling includes at least rudimentary symbolic representational functions, then these may be at the origin of the development of the more fully.symbolically organised communication system. Obviously, symptom and signal functions are prevalent. However, in the course of evolution they have become differentiated, a development which is certainly due in part to the increasing complexity of the emotion system in higher species and the more elaborate cognitive appraisal patterns that precede these. If emotion antecedent cognitive appraisal criteria become more differentiated, then emotional states and their vocal expressive manifestations will also be increasingly differentiated. The complexity of the acoustic patterns in human affect vocalisation reflects the complexity of cognitive evaluation that produces emotional states. This leads me to venture a wild speculation that would certainly have been banned by the French Socittt de Linguis- r i p e when they prohibited any further theorising on the origin of language in 1866 (see Jespersen, 1921: 96): could it be that the specific acoustic patterns representing emotion-eliciting objects or events were paving the way for semanticity in protolanguage?

Note

1. This article is based on an Invited Address delivered at the Third International Confer- ence on Social Psychology and Language held in Bristol, England in July 1987. Given the nature of the original presentation, this paper is mainly directed towards an outline of the work conducted by the author and his collaborators.

References

Bergmann, G., Goldbeck, T. and Scherer, K. R. (1988) Emotionale Eindruckswirkung von prosodischen Sprechmerkmalen. Zeihchrift fiir experitnentelle itnd atzgewandte Psychologie, 35; 167-200.

Brown, C. (1982) Auditory localisation and primate vocal behaviour. In C. T. Snowdon, C. H. Brown & M. R. Petersen (eds) Primate Communication. Cambridge: Cambridge University Press.

Bugental, D. E., Kaswan, J. W. and Love, L. R. (1970) Perception of contradictory mean- ings conveyed by verbal and nonverbal channels. Journal of Personality and Social Psychology, 16,647-55.

Biihler, K. (1934) Sprachtheorie. Jena: Fischer.

Cheyney, D. L. (1984) Category formation in vervet monkeys. In R. HarrC & V. Reynolds Darwin, C. (1872) The Expression of Etnotion in Man andAtiimals. London: John Murray.

Feldstein, S. and Welkowitz, J. (1978) A Chronography of conversation: In defense of an objective approach. In A. W. Siegman & S. Feldstein (eds) Nonverbal Behavior and Communication. Hillsdale, New Jersey: Lawrence Erlbaum.

(eds) The Meaning of Primate Signals. Cambridge: Cambridge University Press.

(22)

Frijda, N. (1986) The Emotions. cambridge: Cambridge University Press.

Giles, H. and Smith, P. M. (1979) Accommodation theory: Optimal levels of convergence.

In H. Giles & R. St. Clair (eds) Thesocial Psychology of Language. Oxford: Blackwell.

Giles, H., Scherer, K. R. and Taylor, D. M. (1979) Speech markers in social interaction.

In K. R. Scherer & H. Giles (eds) Social Markers in Speech. Cambridge: Cambridge University Press.

Giles, H., Mulac, A., Bradac, J. J. and Johnson, P. (1986) Speech accommodation theory:

the first decade and beyond. In M. Mclaughlin (ed.) Communication Yearbook 10.

Newbury Park, California: Sage.

Goffman, E. (1969) The Presentation of Selfin Everyday Life. New York: Anchor,

- (1979) Response cries. In M. von Cranach, K. Foppa, W. Lepenies & D. Ploog (eds) Human Ethology: Claims and Limits of a New Discipline. cambridge: Cambridge Uni- versity Press.

Gouzoules, S . Gouzoules, H. and Marler, P. (1984) Rhesus monkey (Macaca mulatta) screams: Representational signalling in the recruitment of agonistic aid. Animal Behaviour, 32,182.

Hess, U., Kappas, A. and Scherer, K. R. (1988) Multichannel communication of emotion:

Synthetic signal production. In K. R. Scherer (ed.) Facets of Emotion: Recent Research.

Hillsdale, New Jersey: Lawrence Erlbaum.

Jespersen, 0. (1921) Language: Its Nature, Development, and Origin. New York: Macmil- Ian (reprinted 1964, New York: Norton).

Jiirgens, U. (1979) Vocalisation as an emotional indicator: a neuroethological study in the squirrel monkey. Behaviour, 69,88-117.

Ladd, D. R. (1983) Phonological features of intonational peaks. Language, 59,721-59.

Ladd, D., Silverman, K., Tolkmitt, F., Bergmann, G. and Scherer, K. (1985) Evidence for the independent function of intonation contour type, voice quality, and FO range in signalling speaker affect. Journal of the Acoustical Society of America 78,435-44.

Leyhausen, P. (1967) Biologie von Ausdruck und Eindruck (Teil 1). Psychologische Forschung, 31,11>76.

Markel, J. D. and Gray, A. H. (1976) Linear Prediction of Speech. New York: Springer- Verlag.

Marler, P. (1984) Animal communication: Affect or cognition? In K. R. Scherer & P.

Ekman (eds) Approaches to Emotion. Hillsdale, New Jersey: Lawrence Erlbaum.

Marler, P. and Tenaza, R. (1977) Signaling behavior of apes with special reference to vocalisation. In T. A. Seboek (ed.) How Animals Communicate. Bloomington: Indiana University Press.

Mehrabian, A. and Fems, S . R. (1967) Inference of attitudes from nonverbal communication in two channels. Journal of Consulting Psychology, 31,248-52.

Morton, E. S. (1977) On the Occurrence and significance of motivation-structural rules in some bird and mammal sounds. American Naturalist, 111,855-69.

Searle, J. (1986) On Intention. Cambridge: Cambridge University Press.

Scherer, K. R. (1977) Affektlaute und vokale Embleme. In R. Posner & H. P. Reinecke (eds) Zeichenprozesse

-

Semiorische Forschrmg in den Einzelwisserrrchuften. Wies- baden: Athenaion.

- (1982) Methods of research on vocal communication: Paradigms and parameters. In K.

R. Scherer & P. Ekman (eds) Handbook of Methoh in Nonverbal Behaviour Research.

Cambridge: Cambridge University Press.

- (1984) On the nature and function of emotion; A component process approach. In K.

R. Scherer & P. Ekman (eds)-Approaches to Emotion. Hillsdale, New Jersey: Lawrence Erlbaum.

- (1985) Vocal affect signalling: A comparative approach. In J. S . Rosenblatt, C. Beer, M. C. Busnel& P. J. B. Slater (eds) Advances in the Study of Behavior, vol. 15. New York: Academic Press.

- (1986) Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99,143-65.

(23)

100 LANGUAGE AND SOCIAL PSYCHOLOGY

Scherer, K. R. (1988) Criteria for emotion-antecedent appraisal: A review. In V. Hamil- ton, G . H. Bower & N. H. Frijda (eds) Cognitive Perspectives on Emotion and Motiva- tion. Dordrecht: Kluwer.

Scherer, K. R., Feldstein, S., Bond, R. N. and Rosenthal, R. (1985) Vocal cues to deception: A comparative channel approach. Joiirnal of Psycholinguistic Research, 14,409- 25.

Scherer, U., Helfrich, H. and Scherer, K. R. (1980) Internal push or external pull? Deter- minants of paralinguistic behaviour. In H. Giles, P. Robinson & P. Smith (eds) Lan- guage: Social Psychological Perspectives. Oxford: Pergamon.

Scherer, K. R., Ladd, D. R. and Silverman, K. (1984)Vocalcues tospeaker affect: Testing two models. Journal of the Acoustical Society of America, 76,1346-56.

Schlenker, B. (1980) Impression Management: The Self-concept, Social Identity, and Inter- personal Relations. Monterey, California: Brooks/Cole.

Seyfarth, R. M. and Cheney, D. L. (1982) How monkeys see the world: A review of recent research on East African vervet monkeys. In C. T. Snowdon, C. H. Brown & M. R.

Petersen (eds) Priniate Comrnzrriication. Cambridge: Cambridge University Press.

Tembrock, G . (1975) Die Erforschung des tierlichen Stimmausdrucks (Bioakustik). In F.

Trojan (ed.) Biophonetik. Mannheim: Bibliographisches Institut.

Wundt, W. (1903) Grundziige der physiologischen Psychologie, Band 3. Lcipzig: Engel- mann.

~

LANGUAGE AND ETHNICITY IN

MINORITY SOCIOLINGUISTIC PERSPECTIVE

Joshua A. FISHMAN

Distinguished University Research Professor of Social Sciences, Yeshiva University This is a selection of Professor Fishman's writings during the past two decades on language and ethnicity in minority perspective. This volume concentrates on six major topics: (a) What is ethnicity and how is it linked to language? (b) Language main- tenance and language shift in ethnocultural perspective. (c) The ethnic dimension in language planning. (d) Language and ethnicity in education: The bilingual minority focus. (e) Elites and rank-and-file: Contrasts and contexts. (f) Ethnolinguistic homogeneity and heterogeneity: National and international causes and consequences.

Each major topic is prefaced by a specially written introduction, as is the volume as a whole, thereby integrating the material and focusing it on a minority group con- cern. Fishman's well known dedication to worldwide cultural democracy and cultural pluralism, not only as moral imperatives but as empirical assets, shines through all of these selections and unifies them philosophically as well as scientifically.

19B 73Opp MM45 Hbk 1-85359-06-1 f47.00 $99.00 Pbk 1-85359-005-3 f17.95 $38.00

MULTILINGUAL MATTERS LTD Bank House, 8a Hill Road, Clevedon,

Avon, England, BS217HH.

On the Symbolic Functions of Vocal Affect Expression

Article

Reference