From Human Gesture to Synthetic Action

(1)

From Human Gesture to Synthetic Action

Michael Kipp

University of the Saarland (DFKI) Stuhlsatzenhausweg 4

66123 Saarbr ¨ucken

[email protected]

ABSTRACT

EmbodiedAgentsare still lookingfor their own bodylan-

guage. Oureortsaimatanalyzinghumanspeakerstoex-

tract individualrepertoires of gesture. Weargue that TV

personalitiesoractorsarebettersuitedforthispurposethan

ordinarysubjects. For analysis, wepropose anannotation

schemeand present the tool Anvil for transcodinggesture

andposture. Fortransfertosyntheticagents,wesuggestto

thinkof gestures as categorizable inequivalenceclasses, a

subsetofwhichcanmakeupanagent'snonverbalrepertoire.

Resultsofinvestigating dierent levelsofgesture organisa-

tion and posture shifts are presented. Theconcept of G-

Groups is introduced which is hypothesized to correspond

to rhetorical devices. Also, we give a brief sketch onhow

theseresultsare plannedtobeintegratedinamultimodal

generationsystemforpresentationteams.

Keywords

NonverbalBehavior,EmbodiedAgents,LifelikeCharacters

1. INTRODUCTION

Nonverbalbehaviorlikegestureandpostureplaysacru-

cial role in making communication smoother, facilitating

understanding and interacting in a social way. With the

riseofsyntheticactorsinthehuman-computerinterfacethis

areahasexperiencedasurgeofinterestinaneorttomake

agents more \life-like". At DFKI, we pursue the idea of

interfacing withthe uservia teams ofagentsthat interact

with each other in what can be considered a performance

[2]. Information isconveyedindirectlythroughobservation

ofdialogues. Thisoerspossibilitiesof moreclearly struc-

tured,more socially interesting, moresubtle and moreen-

tertaining presentation [1]. First informal user studies of

thisapproachhavebeenpromisingbutalsorevealedaneed

formoresophisticatedgenerationofnonverbalbehavior. As

opposedtosingleagent scenarios[8, 24]wemustdeal with

the additional challenge of making the agents distinguish-

ableindividuals. Abasicrequirementforthisareindividual

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Copyright 2001 ACM 0-89791-88-6/97/05 ...

^$

5.00.

repertoires ofnonverbalbehavior.

Ourgeneralresearchagenda hasthreephases. First,by

lookingatempiricaldata,wesearchfor consistentgestures

thatcould betakenasequivalenceclasses intermsofform

and function,and assemblethemtoindividual repertoires.

Second, we aim tointegratethese resultsinarunning ap-

plicationbymodelinginstancesoftheseequivalenceclasses

as animation clips. Third, asystem evaluation will assess

itsnonverbaleectiveness.

Lookingatempiricaldatahasbeendonebeforeforother

systems [8, 24]. Such studies have a long history in psy-

chological,anthropologicalandsemioticresearch[25,15,9].

Themajorityoftheliteraturehasbeenconcernedwithgen-

eralpropertiesof\spontaneousgesture"[22]. Subjectswere

ordinarypeoplechosenwithoutregardtoqualityofgesture.

Also,it hasusuallynotbeenundertakentocollect individ-

ualrepertoires[23]. Foreectiveuseinsyntheticcharacters,

though,thetimehasdenitelycometoanalyzethekindof

speakerwe aim tomodel: a rhetoricallyprocient speaker

who useshis/her nonverbal repertoire to maximum eect,

thus giving eshto the motivationof usingabodyat all.

Forsuchspeakersweneedtoidentifyindividualrepertoires,

patternsof rhetoricallyeectivedeliveryandall thedier-

ent sources that inform the selection of nonverbal signals.

Weenvisionastateoftheartwhereinterchangeablegesture

proles canbeusedandreusedtogiveasyntheticcharacter

itsveryspecialhumantouch.

Butcantherebeany suchthingas arepertoire? Is the

space of personal gesticulation not innite? As has been

arguedbefore[17]thisstronglydependsonthetopic ofthe

talk. Describingspatialrelations(e.g.givingdirectionstoa

foreigner onthe street)or actions (e.g.recountingcartoon

animations) results inamyriadof dierent, often singular

gestures, most of themprobably made up onthe spot for

thespecicpurposeathand. Most suchgestures wouldbe

pointlessto model. Ontheotherhand,\themoreabstract

and metaphorical the content the gesturepertainsto, the

more likely we are to observe consistencies inthe gestural

formsemployed"[17]. Weunderstandthese\consistencies"

asequivalenceclassesofgesturethatcanbefound,modeled

and used for agentsinvolved in\abstract" talk. Abstract

topicscanbeasdiverseassalesdialogues,weatherforecasts,

newsreportsorliterarydiscussions.

Thispaperdescribesworkinprogress,coveringphaseone

(empiricalinvestigation)andaspectsofphasetwo(integra-

tion)inourresearchagenda. We reportonthe annotation

of video material, the coding schemeinvolved and coding

(2)

G−Group G−Group Complex

G−Phase

G−Phrase G−Phrase Complex

Figure1: Kinesichierarchy.

reliability. The tool Anvil 1

[18], especially developed for

thisproject,ispresented. Resultsaboutrepertoiresofhigh

frequencygestures,aboutgesturetiming andfunctionsare

laidout. Finally,weoutlineasampleapplicationwherethe

foundresultscanbeintegrated.

2. ANNOTATION

OurdatawastakenfromtheGermanTVshow\Literary

Quartet",wherefourpeoplepresentanddiscussrecentbook

releases. Theshowisidealforourpurposesfortworeasons.

First,thereisagoodmixofabstracttalk and interaction.

Second, the two speakers chosen for this research, called

MRR and HK, have active, procient gesticulation and a

stronglyperceivedpersonality. \Procient" is anintuitive

concepthere,not tobefullydiscussed. Weonlypointout

thatcriteria like clear gesturesegmentation(inanalogy to

cleararticulationinspeech),cleargesturalforms,variation

informandtempocouldallcontributetotheperceptionof

prociency.

For analysis, we transcribed speechusing PRAAT 2

and

encoded text structure according to Rhetorical Structure

Theory (RST)[20]withtheRSTtool 3

. Bothdatawereim-

portedinAnvilwherepostureandgesturewereencodedas

describedbelow.

2.1 Gesture Structure

Gestureisaeetingconcept. Inordertobeabletotran-

scribe gestureand talkabouttimingandfunction,astruc-

turalframeworkisindispensable.

Gesturescanbedissectedintomoreelementarymovement

phases[15]orclusteredtogether,thenconstitutingahigher

unitwithafunction of itsown. This isthe essence ofthe

hierarchical view on gesture structureas proposed by [15,

22]anddepictedinFig.1(wereplacedtheG-Unitlayerby

theG-Grouplayerandaddedcomplexes,seebelow).

The G-Phaselayer comprisesthe movement phases ofa

gesture. Sincethisisnotthefocusofthispaper,allweneed

toknowaboutthislayeristhatthestrokeisthemostener-

geticpartofagesture,usuallythepartthatissynchronized

withspeech (cf.[19, 15, 22] for furtherreading). Gestures

themselvesarelocatedontheG-Phraselayerandconsistof

oneormorephases. Manyclassicationsystemsforgesture

have beenproposed [23, 10, 22, 9]. Wesettled ona com-

promise between semiotic and functional views[16], using

thefollowing categories: emblems,adaptors,deictics, icon-

ics, metaphorics and beats. Our priority was clearly that

thecategoriesare easytoidentify(see Section2.2 for sub-

categorization).

1

Anvil is freely available for research purposes under

http://www.dfki.de/~kipp/anvil

2

byBoersma/Weenik: http://www.fon.hum.uva.nl/praat

3

inward outward round

direct quick slow

vertical

diagonal horizontal

non−circular circular

dynamic

static right hand left hand

both hands

curved straight

tense relaxed narrow

expansive features

G−Group

Figure2: G-Group features withintensity arrows.

Onthenexthigherlayer,wedeneG-Groupsassequences

of contiguous gestures that share the cohesive properties

shownin Fig. 2 (inspiredby [21]). E.g., a stretch of two-

handed gestures forms a G-Group in opposition to a fol-

lowing group of left-handed gestures. A G-Group has no

type, itssole function being the grouping together of ges-

turephrases. Wedivertfrom the literature[15, 22] where

thenexthigherlayerisusuallytheG-Unit denedassubse-

quentgesturesbetweentworestpositionsbecausewedidnot

ndanyobvious correlation tospeech. Kendoncalled ges-

turesequenceswithsharedhandedness\groupingsorparts"

in[15]andhasthusconsideredaspectsofG-Groupswithout

puttingthemintothehierarchy.

Ontopofthethree-layeredhierarchy,wecallspecialpat-

ternsofG-PhrasesandG-Groupscomplexes (see[13]foran

analogouslanguageconcept). Repetitionofgestureissuch

apatternontheG-Phraselayer. Likewise,gesturesthatare

frequentlyusedtogetherbyanspeakerformacomplex.

2.2 Coding Scheme

Gestures were transcoded on three levels: G-Phase, G-

Phrase and G-Group. We focus onG-Phrases whosesub-

categoriesmakeupournotionofgestureequivalenceclasses.

Gesturesarerstclassiedinto thesixmajor categories:

Emblems are gestures withconventionalized meaning that

canusuallybereplacedbyaverbalization[5]. Adaptorsare

self-touches or object-touches and serve to disperse excess

energy (nervosity) and/or to focus on cognitive processes

[12]. Deictics are pointing gestures. Iconics are gestures

thatcorrespondtoaspeechconcept bysomedirectresem-

blance relation (e.g.writing into theair while spelling out

a name), whereas metaphorics relate to speech in a more

abstract way (e.g.holding yourpalm like acupwhenask-

ing aquestion asif readyto \receive"materialanswer, cf.

[22]). Finally,we decided to treat beats as a rest classon

the G-Phrase layer because we seea functionalseparation

betweenG-PhraseandG-Phaselayer:

Rhythmicalbeats,thataccompaniedmostofthegestures

(3)

G-Phaselayer. Almosteverygesture canhavesuchrhyth-

micalbeatswhichservecertainfunctions(e.g. highlighting

newinformation [22]). Onthe G-Phrase layer, however, a

dierentlevelofmeaningisaddedthroughthespecicform

ofa gesture. What are thenthe traditionalbeat or baton

gestures found in the literature [11][22]? In our opinion,

theyare gestures whoseformdoesnot carryany meaning,

onlytheir rhythm being important. Therefore, on the G-

Phraselayer, suchgestures can beconsidered a rest class.

Thisisaconvenientwayofdealingwiththealternativeview

thatbeatsaresuperimposedonothergestureswhichwould

complicateannotation.

Fortheemblems,iconicsandmetaphoricsthereisarange

of67subcategories whichwerefoundduringannotationand

documentedinamanual withvideostills. Entriesfor em-

blems specify form, function, concomitant speech, verbal-

ization andsimilar gesturecategories(toavoid confusion).

Entries for iconicsandmetaphorics specifyformand func-

tion. Deictics andadaptors are coded withoneparameter

each: aimofdeixis(self,addressee,space)andtouchregion

(head,shoulder,table)fortheadaptor.

Thesesubcategoriesarewhatwehypothesizetobeequiv-

alenceclasses,i.e.instancesofonesubcategoryaretakento

beexchangeablebyanyotherinstanceinthisgroup.

Notethatqualitativeparameters[4]arenotcodedonthe

G-Phraselayer. Instead,theyarehandledontheG-Group

layer. Parameters like handedness, speed or direction of

movementareseennotasinherentparametersofsingleges-

tures but as cohesive glue grouping sequences of gestures

togetherto G-Groups. Subsequent gestures belong to the

sameG-GroupitheyshareallthesamepropertiesofFig.2.

Forposture, wedistinguishedbetweentwopositions(up-

right, relaxed), and four kinds of motion (legs-cross, legs-

uncross,upperbody,andwholebody).

2.3 Annotation Tool

For transcodinggesture andposturethe genericannota-

tiontoolAnvil [18] (Annotationof Video and Language)

was developed(Fig. 3). It allows annotation of videoles

(e.g.QuicktimeorAVI)onmultiplelayers,so-calledtracks.

Inprimarytrackstheusercanaddelementsbymarkingbe-

gin andendtime. Secondary trackscontain elementsthat

pointto elementsofanother,reference track. Toinsertan

element,theuserspeciesrstandlastelementintherefer-

encetrack. Alltrackelementscontainattribute-valuepairs.

Theattributes'namesandtypesarespeciedbytheuserfor

eachtrack.AttributestypesareValueSet,String,Boolean,

NumberandMultiLink. ForValueSetstheuserdenesaset

ofpossible valuesthat will reappear intheGUI as anop-

tionmenu.Stringtypedattributesoerastringinputeld,

BooleansacheckboxandNumbertypesaslider. MultiLink

typesallowtheselection ofarbitrarytrack elementsinthe

annotation,therebypermitting within-leveland cross-level

linkage.

For the scheme outlined in Section 2.2, G-Phases were

encodedinaprimarytrack,G-Phrasesinasecondarytrack

whose elements point to G-Phase elements. Likewise, G-

Groupswerecodedinasecondarytrack pointingtotheG-

Phrasetrack(seeFig. 3).

Anvilstandsoutincomparisontosimilartoolsinthatit

istheonlynon-commercial,fullyimplemented,XML-based

andplatform-independenttoolspecicallydesigned forthe

0 10 20 30 40 50 60 70

0 2 4 6 8 10 12

amount

clip number all categories

over 1%

0 20 40 60 80 100

0 2 4 6 8 10 12

gesture occurrence coverage in %

clip number

Figure 4: Increase of subcategories with increas-

ing material (left)and percentage ofgesture occur-

rences covered by mostfrequentcategories(right).

work).Especiallycross-levellinkageisafeaturerarelyfound

inothertools.

2.4 Reliability

Inter-coderreliabilitywasanalyzedintwodierentstud-

ies on the G-Phrase level to check consistency of the 67

gesture subcategories. Ineach study, two codersindepen-

dentlyannotated2:54 minutes(study1) and 2:26 minutes

(study2)ofmaterialcontainingpre-annotatedG-Phasesof

speakerMRR.

SegmentationintoG-Phrases(gestures)wasnear-perfect:

segmentation subcat. subcat.kappa

study1 93.0% 64.0% 0.60

study2 100.0% 79.4% 0.78

Insubcategoryagreement,therststudyyieldedcritical

results. Thesecondstudy,building oninsightsofthe rst

studyandthegrownexperienceofthecoders, resultedina

satisfying79%agreementandaKappavalueof 0.78which

isveryclose to\goodagreement"[7].

3. RESULTS

Our results yielded evidence that our equivalence class

approachis feasibleintermsof extraction(seeabove)and

modeling(bytakinghighfrequencygestures,seebelow). We

willalsopresentinsightsontimingandfunction.

In all, we annotated 40:09 minutes of data of the two

speakersMRRandHK.Tab.1showstheamountofphases,

phrasesandgroupsfoundperspeaker. Weidentied67dif-

ferent gesture subcategories for both. Naturally,the more

material we annotated the more subcategories we found.

Fig. 4shows the increaseofgesture subcategories within-

creasingmaterialforthespeakerMRR(leftdiagram,upper

line). Whenweconsideronlythosesubcategorieswhosein-

stancesmakeupmorethen1%ofalloccurrencesweobtain

asurprisinglyconstantamountof21{24 subcategories(left

diagram, lower line). These high frequency subcategories

seem natural candidatesfor being modeledas animations.

But how muchof the original occurrences of gesture does

thissetofsubcategoriescover? Fig.4(right)showsthecov-

erage oftheMRR's21subcategories over1%. Althoughit

is slightly decliningwithgrowing data, inthe nal corpus

of 604occurrencesthe \over1%"subcategories stillcovers

about83%.

For speaker HK we found 19 \over 1%" subcategories.

ComparingthetwosetsofHKandMRR,therewasanover-

lapof12subcategories. Individualityisthusfoundalready

inthedierentrepertoire,furthermoreindierentfrequency

distributionandusageinfunctionandtimingas treatedin

(4)

tracks collapsed of other tracks for better visibility

track window displaying info for active track and currently selected element

contents of currently selected track element

controls for inserting new elements (also accessible by context menus) room for comments for this element text field for messages

and trace read−outs

seconds time bar with

elements labelled and color−coded according to user specification zoom panel

bookmark sign

playback speed slider video window with frame−wise control

hierarchically organized tracks active track (highlighted) playback line currently selected element (highlighted)

Figure 3: Anviltoolfor multi-layeredannotation ofaudio-visual data

MRR HK total

G-Phases 1,422 663 2,085

G-Phrases 604 265 869

G-Groups 286 103 389

Table 1: Numberof encodedentities

3.1 Gesture Analysis

Gestures are timedto co-occur withrelated speechcon-

cepts. Kendon introduced the notion of idea units as the

commonoriginofspeechandgestureproduction[15]. These

would be manifest in intonation unit (tone units) on one

hand and G-Phrases on the other. We tried to nd rela-

tionshipstosyntacticentitiesastheywouldbemorereadily

availableinaspeechgenerationarchitecture.Thefollowing

timingpatternsoccurred:

1. Direct:Gesturessynchronizetheirstrokedirectlywith

aword(noun,verb,adverb...) oranoun/prepositional

phrase(NP,PP).

2. Indirect: The stroke does not cover the correlated

noun, verb etc. itself, butclosely linkedinformation,

e.g.thenoun'smodierortheverb'stransitiveobject

ortheverb'snegationparticle.

3. Init:Gesturesthatrefertoconceptsdeeperinaclause

or phrase but only cover the rst 1{2 words; this is

especiallyencountered withgestures that illustratea

process and correlate to a verb. They occur at the

beginningofaverbalphrase(VP)wheretheverbisin

endposition, alleviatingthe factthat inGermanthe

verbisoftenlocatedatthebackofthesentence.

4. Span: Gesture stroke(s) (plushold)cover(s)a whole

What follows are someinsightsonfunctionand timingwe

gatheredforeachgesturetype,forG-Phrasecomplexes,G-

GroupsandG-Groupcomplexes.

Emblems should be most easily identied since their

meaningisconventionalized. Functionsidentiedare: speech

act,displayemotion,displayattitude(certainty)andseman-

tic illustration. Someemblemsalwaysworkwiththe same

timing pattern, e.g., MRR's mostfrequent speech act em-

blem \attention"consistently withspantiming. Emblems

likethisalsoconveyastronghighstatusmessage(according

to[14])thatmaypartlyexplainMRR'sreputationasbeing

highly dominant. Apoint tobefurtherinvestigatedinthe

future.

Adaptorsareconsideredtopossiblyco-occur with\sig-

nicantpointsof thespeechow"[25]. Inourdata,adap-

torsoccurredwhilethespeakersearchingforaword(speech

failure) or withthe beginningof anRST segment, usually

coveringashortpause. AccordingtoJohnstone,alladaptors

conveylowstatus[14].

Deictic gestures directed at the addressee function as

(1)referencetotheaddresseebydirectsyncwithpronouns

(\you", \your"),(2)reference totheaddressee'sutterance,

timed with the span or init pattern and (3) turn-taking

signals [26]: HKexclusively signaled hold-turn,MRRonly

yield-turn(MRRchairsthediscussion). Pointingdirectlyat

somebodywasalsofoundcapableofexpressingaggression.

Iconics illustrate objects, processes and qualities [15].

Objectgesturesareusuallyindirectsyncwithanoun.Pro-

cessgesturessyncwithaverb(direct),theverb'stransitive

object(indirect)orthebeginningoftheVP(init). Quality

gesturesoccurindirectsyncwithanNP.

Metaphorics are similarto iconicsinusage, onlytheir

meaning is more generic. Themost common conduit ges-

(5)

ture[22], e.g., can co-occur withalmostanyobjector ab-

stractnotioninspeech.Someprocessgestureswerefoundto

co-occurwithconnectivesorclauseborders,metaphorically

indicatingmotiontothenextpartofthediscussion.

G-Phraseg

1

G-Phraseg

2

~

P(g

2 jg

1 )

emblem.dismiss emblem.wipe 0.21

metaphoric.conduit-ing metaphoric.conduit-ing 0.20

emblem.dismiss emblem.dismiss 0.20

emblem.attention emblem.attention 0.14

emblem.dismiss emblem.dismiss 0.14

metaphoric.conduit iconic.strength 0.13

deictic.space deictic.space 0.13

metaphoric.conduit metaphoric.conduit 0.10

Table2: Most frequent bigramsofMRR's gestures

(G-Phrases)withestimatedconditional probability.

G-Phrase Complexes are formed, e.g., by repetition.

Repetitionaddscohesiontothediscourseandcanbindparts

ofasentencetogetherthatareseparatedbyembeddedstruc-

tures.Butareallgesturessuitableforrepetition? Statistical

analysisresults inthe patternsinTab. 2forspeakerMRR

(onlythemostfrequent,togetherwithestimatedconditional

probability). Moreimportantly,ifthesametwogesturesoc-

curtogetheragainandagain,theycanbeinterpretedasan

idiosyncratic compound gesture that extends the original

repertoire,thusaddingtopersonalstyle. WefoundMRR's

frequentemblemsequence\dismiss{wipe"highlycharacter-

isticforthespeaker.

G-Groups are formed by gestures that sharethe same

properties fromFig. 2. Most G-Groups correspond to one

ofthetworhetoricdevicesstatedby[3]: ListofThreeand

Contrast. These devices are non-verbally marked in pub-

licspeaking tocoordinateaudience reaction (laughter,ap-

plause). Moregenerally,ListofThreeandContrast canbe

takenasRSTrelationsandG-Groupsasmarkersforrhetor-

icalsegments.

G-Group Complexes. A two-handed gesture is more

visible thana one-handed one, an expansive gesturemore

thana narrow one. To capture a grading of intensity, we

introduce an intensity gradation for some dichotomies in

Fig. 2. Intensity rises in direction ofthe arrow. Now, G-

GroupAismoreintensethanG-GroupB ifAhasatleast

onemoreintenseproperty.Iftherearemoreandlessintense

propertiesthereisnorelation. SpeakerMRRorganizeshis

G-Groupssuchthatintensityrises(crescendo)andthensud-

denlydropstogiveweighttothelastpartofspeech. Asim-

pleridentiedpatterniscontrast: twocontrastingsegments

arecoveredby G-Groupsdieringinintensity. A thirdus-

ageofG-Groupsisthemarkingofembeddedclausestomake

thesurroundingutterancemorecohesive.

3.2 Posture Analysis

total segment interaction other

begin end engage disen.

cross 24 15 4 | 2 4

uncross 24 4 14 3 | 3

Table3: Postureshifts ofspeakerMRR

Posturewas hypothesizedbyScheen tocorrespondtoa

S:EvaluatePos

S:ConcludePos S:EnumeratePos

S:InformValue

S:InformValue S:InformValue

S:OtherDim S:DismissDim Dialog

InitDialog

FinishDialog Book2 Book1

Interview . . .

. . .

QuestionAnswer QuestionAnswer S:IntroBook S:DescribePos S:Summary

B:RequestFeature S:InformFeature DiscussFeature

B:NegResponse S:ResponseNegResponse

Figure5: Discourse specication tree

signalmovingfromonetopictoanother. Weinvestigatedfor

speakerMRRwhenandhowhemoved(HKhardlychanged

postureatall). Specically,wefoundthathiscrossingand

uncrossing legs followed apattern shown inTab. 3. Most

posture shifts consideredoccurred at theboundaryoftwo

RSTsegmentsAandB (77%),withaspeechpauseof0.2{

0.5 seconds between A and B. Moreover, depending on

whether the shift occurred at the end of A or at the be-

ginningofB,MRRpreferablyuncrossedorcrossedhislegs.

Asecondcorrelation,postureshiftandinteraction,isonly

slightlyindicated by thedata(there was too littleinterac-

tion): MRRuncrossedhislegsforengagingin(gettingready

for) aninteractionand crossedhis legs afterhavingdisen-

gagedfromaninteraction.

4. APPLICATION SKETCH

Asthis is work inprogress, we only brieyoutline how

our results will be integrated in an applicationof presen-

tationteams bypointingatcritical decisionpointsfor ges-

ture/postureselection:

Gesturerepertoires(ofsubcategories)aremodeledasani-

mationclipsusingthe3DStudioMAX animationsoftware.

Eachgesturemustbeprovidedwith(1)variousin/outposi-

tions,(2)dierentposturesand(3)dierenthandednessto

allowvariabilityandpostureshifts. Postureshiftsmustbe

modeledseparately. Gestures arethenclassiedandstored

accordingtogesturecategory,functionandtimingpattern.

Asasampleapplicationanexisting presentationgenera-

tionsystemfor sellingcars[2]wasadapted toabooksales

scenariocalled\BookJam". Thepresentationteamconsists

ofabookselling agent(S)whopresentsanumberofbooks

totwobuyingagents(B).Theensuingdialogue,containing

monologic passagesofthesellerandcriticalquestionsfrom

thebuyers,indirectlypresentsdierentfactsandaspectsof

thebooktoanobservinguserinanentertainingfashion.

Togenerate such a dialogue, a plan-basedsystem com-

putes a presentation plan at run-time(a sampleis shown

in Fig. 5) making use of parametersthat model personal-

ity andstatus ofthe agents. Thisplan species thewhole

agent-agentinteraction,eachnoderepresentingaplanoper-

ator that redduringprocessing. Wenowdistinguish four

node types: Surface nodes are the leaves where text and

gestureactuallyappearintheirnalform.Rhetoricalnodes

(underlined)are thosenodes thatonlycontainchild nodes

of a singleagent. Theminimal nodes that comprise more

than one agent are interactive nodes (boxed). All others

are called topical (boxed,round corners) astheystructure

largerportionsofthedialogue.

Tointegrategesture/posturegenerationintotheplanop-

(6)

adaptors. Interactivenodesonadaptorsandturn-takingsig-

nals(e.g.,deictics),speechactemblemsandpostureshifts.

Inrhetoricalnodes,G-GroupsandG-Phrasecomplexesare

planned(determininghandednessandgesturequalities)and

alsodiscoursestructuringgestures (metaphorics,emblems).

Finally,onthe leaves, gestures ofall typescanbeselected

accordingtothesemanticsofconcomitantspeech.Thechal-

lengeatsurfacenodesistheintegrationofalldecisionsmade

here and inhighernodes and the coordination of gestures

andpostureshiftsinaccordancewithgiventimingpatterns.

At the time of writing, all this belongs to futurework.

Furthermore, we will think about ways of evaluating the

system with regard to the impact of nonverbal behavior,

thustyingtogetherempiricalresultsandapplication.

5. DISCUSSION

Wehaveidentiedanddocumentedtwoindividualreper-

toiresofgesturethatcanbeusedinagenerationsystemfor

presentationteams. Asopposedtorelatedresearch,we(1)

haveproposedtheanalysisofrhetoricallyprocientspeakers

and(2)haveembarkedontheeorttocollectandanalyze

individualrepertoires insteadof generalizingoversubjects.

The main research tool, Anvil, was presented, as well as

anannotationschemeandsomeresultsontiming,function

andfrequencies. Anewstructurallayer,the G-Group,was

introducedandhypothesizedtocoincidewithrhetoricalde-

vices,thoughmoreworkneedsto bedoneonaconceptual

level. Posture shiftswere found at borders of larger RST

segments. Theirtimingandform(crossing/uncrossinglegs)

were identiedfor one speaker. A sampleapplicationwas

outlinedwithrstideashowtointegratetheresults.

For thefuture, more workonfast, easyand reliablean-

notationwillbedone. Acodingmanualusingdecisiontrees

for formand functionisbeing considered. Emblem deni-

tionswillbemademoreformalusing,e.g.,BittiandPoggi's

scheme for classifying holophrastic emblems by specifying

(1)arguments,(2)predicate,(3)tense,and(4)attitude[5].

Onthegenerationsideweare workingonarchitecture,3D

gesturemodeling and multimodalfusion onsurfacenodes.

Attributionexperimentsareplannedtoinvestigatewhether

someemblemshave strongconnotationsintermsofstatus

(high,low),attitude(positive,negative)andemotion.

6. ACKNOWLEDGMENTS

Thisresearchissupportedbyadoctoralscholarshipfrom

the German Science Foundation (DFG). Many thanks to

RalfEngel and MartinKlesen for gestureannotation, and

toElisabethAndrefordiscussionandadvice.

7. REFERENCES

[1] E.AndreandT.Rist.PresentingThroughPerforming:On

theUseofMultipleLifelikeCharactersin

Knowledge-BasedPresentationSystems.InProceedingsof

theSecondInternationalConferenceonIntelligentUser

Interfaces(IUI2000),pages1{8,2000.

[2] E.Andre,T.Rist,S.vanMulken,M.Klesen,and

S.Baldes.TheAutomatedDesignofBelievableDialogues

forAnimatedPresentationTeams.InJ.Cassell,

J.Sullivan,S.Prevost,andE.Churchill,editors,Embodied

ConversationalAgents,pages220{255.MITPress,

Cambridge,MA,2000.

[3] M.Atkinson.OurMasters'Voices.Thelanguageandbody

animationbasedonmovementobservationandcognitive

behaviormodels.ComputerAnimationConference,1999.

[5] P.E.R.BittiandI.Poggi.Symbolicnonverbalbehavior:

Talkingthroughgestures.InR.S.FeldmanandB.Rime,

editors,FundamentalsofNonverbalBehavior,pages

433{457.CambridgeUniversityPress,NewYork,1991.

[6] P.E.Bull.PostureandGesture.PergamonPress,Oxford,

1987.

[7] J.Carletta.AssessingAgreementonClassicationTask:

TheKappaStatistics.ComputationalLinguistics,

22(2):249{254,1996.

[8] J.Cassell,T.Bickmore,L.Campbell,andH.Vilhjalmsson.

HumanConversationasaSystemFramework:Designing

EmbodiedConversationalAgents.InJ.Cassell,J.Sullivan,

S.Prevost,andE.Churchill,editors,Embodied

ConversationalAgents,pages29{63.MITPress,

Cambridge,MA,2000.

[9] D.Efron.GestureandEnvironment.King'sCrownPress,

NewYork,1941.

[10] P.EkmanandW.V.Friesen.TheRepertoireofNonverbal

Behavior: Categories,Origins,andCoding.Semiotica,

1:49{98,1969.

[11] P.EkmanandW.V.Friesen.Handmovements.Journal of

Communication,22:353{374,1972.

[12] N.FreedmanandS.P.Homan.Kineticbehaviorin

alteredclinicalstates: Approachtoobjectiveanalysisof

motorbehaviorduringclinicalinterviews.Perceptualand

MotorSkills,24:527{539,1967.

[13] M.A.K.Halliday.Languageassocialsemiotic.Edward

Arnold,London,1978.

[14] K.Johnstone.Impro forStorytellers.Routledge/Theatre

ArtsBooks,NewYork,1999.

[15] A.Kendon.Gesticulationandspeech:Twoaspectsofthe

processofutterance.InM.R.Key,editor,Nonverbal

Communication andLanguage,pages207{227.Mouton,

TheHague,1980.

[16] A.Kendon.GestureandSpeech.howTheyInteract.In

J.M.WiemannandR.P.Harrison,editors,Nonverbal

Interaction,pages13{45.Sage,London,1983.

[17] A.Kendon.AnAgendaforGestureStudies.TheSemiotic

ReviewofBooks,7(3):8{12,1996.

[18] M.Kipp.Anvil{aGenericAnnotationToolfor

MultimodalDialogue.InProceedingsofEurospeech 2001,

2001.(submitted).

[19] S.Kita,I.vanGijn,andH.vanderHulst.Movement

phasesinsignsandco-speechgestures,andtheir

transcriptionbyhumancoders.InI.Wachsmuthand

M.Frohlich,editors,GestureandSignLanguagein

Human-ComputerInteraction,pages23{35.Springer,1998.

[20] W.C.MannandS.A.Thompson.RhetoricalStructure

Theory: Towardafunctionaltheoryoftextorganization.

Text,8(3):243{281,1988.

[21] R.Martinec.Typesofprocessinaction.Semiotica,

130(3/4):243{268,2000.

[22] D.McNeill.HandandMind: WhatGesturesRevealabout

Thought.UniversityofChicagoPress,Chicago,1992.

[23] C.Muller.RedebegleitendeGesten: Kulturgeschichte,

Theorie,Sprachvergleich.BerlinVerlag,Berlin,1998.

[24] I.Poggi,C.Pelachaud,andF.deRosis.Eye

CommunicationinaConversational3dSyntheticAgent.

AI Communications,2000.

[25] B.RimeandL.Schiaratura.Gestureandspeech.InR.S.

FeldmanandB.Rime,editors,FundamentalsofNonverbal

Behavior,pp.239{281.CambridgeUniversityPress,N.Y.,

1991.

[26] H.Sacks,E.A.Schleglo,andG.Jeerson.Asimplest

systematicsfortheorganizationofturntakingfor

conversation.Language,50:696{735,1974.

[27] A.E.Scheen.TheSignicanceofPosturein

CommunicationSystems.Psychiatry,26:316{331,1964.