• Aucun résultat trouvé

SeineDial: 16th Workshop on the Semantics and Pragmatics of Dialogue (SemDial)

N/A
N/A
Protected

Academic year: 2021

Partager "SeineDial: 16th Workshop on the Semantics and Pragmatics of Dialogue (SemDial)"

Copied!
197
0
0

Texte intégral

(1)

HAL Id: hal-01138035

https://hal.archives-ouvertes.fr/hal-01138035

Submitted on 3 Apr 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

SeineDial: 16th Workshop on the Semantics and Pragmatics of Dialogue (SemDial)

Sarah Brown-Schmidt, Jonathan Ginzburg, Staffan Larsson

To cite this version:

Sarah Brown-Schmidt, Jonathan Ginzburg, Staffan Larsson. SeineDial: 16th Workshop on the Se- mantics and Pragmatics of Dialogue (SemDial). France. 2012. �hal-01138035�

(2)

Proceedings of SemDial 2012 (SeineDial):

The 16 th Workshop on the Semantics and Pragmatics of Dialogue

Sarah Brown-Schmidt, Jonathan Ginzburg, Staffan Larsson (eds.)

(3)

Sponsors

(4)

MARGUERITE: Il m’a assur´e que tu n’avais jamais ´et´e amoureux.

JACQUES: Oh! pour cela il a dit vrai.

MARGUERITE: Quoi! Jamais de ta vie?

JACQUES: De ma vie.

MARGUERITE: Comment! `a ton ˆage, tu ne saurais pas ce que c’est qu’une femme?

JACQUES: Pardonnez-moi, dame Marguerite.

MARGUERITE: Et qu’est-ce que c’est qu’une femme?

JACQUES: Une femme?

MARGUERITE: Oui, une femme.

JACQUES: Attendez . . .

Denis DiderotJacques le fataliste et son maˆıtre We are happy to present SemDial 2012 (SeineDial), the 16th annual workshop on the Semantics and Pragmatics of Dialogue. This year’s workshop is hosted at Universit´e Paris- Diderot, named for the greatencyclop´ediste, himself a great writer of dialogues. SeineDial continues the tradition of presenting high-quality talks and posters on dialogue from a variety of perspectives such as formal semantics and pragmatics, artificial intelligence, computational linguistics, and psycholinguistics.

38 submissions were received for the main session, and each was reviewed by three experts. 16 talks were selected for oral presentation; the poster session hosts many of the remaining submissions, together with additional submissions that came in response to a call for late-breaking posters and demos.

We are lucky to have three world famous researchers as invited speakers—Eve Clark, Geert-Jan Kruijff, and Fran¸cois Recanati. Each of these represents a broad range of perspectives and disciplines. We are sure that their talks will stimulate much interest and at least some controversy. Together with the accepted talks and posters we look forward to a productive and interactive conference.

We are grateful to the reviewers, who invested a lot of time giving very useful feedback, both to the program chairs and to the authors, and to members of the local organizing committee, Anne Abeill´e, Margot Colinet, and Gregoire Winterstein for their hard work in helping to bring the conference to fruition.

We are also very grateful to a number of organizations, who provided generous financial support to SeineDial:

• CLILLAC-ARP, Universit´e Paris-Diderot

• Laboratoire de Linguistique Formelle, Universit´e Paris-Diderot

• The Laboratoire d’excellence LabEx-EFL (Empirical Foundations of Linguistics), Paris Sorbonne-Cit´e.

• La r´egion ˆIle de France, through their competitive scheme Manifestations scien- tifiques en ˆIle-de-France hors DIM.

Sarah Brown-Schmidt, Jonathan Ginzburg, Staffan Larsson September, 2012

(5)

Programme Committee

Sarah Brown-Schmidt

(University of Illinois, Urbana Champaign, Co-chair)

Staffan Larsson

(Gothenburg University, Co-chair)

• Jennifer Arnold (University of North Carolina)

• Ron Artstein (Institute for Creative Technologies, LA)

• Ellen Gurman Bard (Edinburgh University)

• Luciana Benotti (Universidad Nacional de Crdoba)

• Claire Beyssade (Institut Jean Nicod, Paris)

• Nate Blaylock (IHMC)

• Johan Bos (Groningen University)

• Susan Brennan (SUNY, Stonybrook)

• Mark Core (Institute for Creative Technologies, LA)

• Mariapaola D’Imperio (LPL, Aix en Provence)

• David Devault (Institute for Creative Technologies, LA)

• Myroslava Dzikovska (Edinburgh University)

• Jens Edlund (KTH, Stockholm)

• Heather Ferguson (University of Kent)

• Raquel Fern´andez (University of Amsterdam)

• Victor Ferreira (UC San Diego)

• Claire Gardent (LORIA)

• Kallirroi Georgila (Institute for Creative Technologies, LA)

• Eleni Gregoromichelaki (King’s College, London)

• Anna Hjalmarsson (KTH, Stockholm)

• Amy Isard (Edinburgh University)

• Elsi Kaiser (University of Southern California)

• Andrew Kehler (UC San Diego)

• Ruth Kempson (King’s College, London)

• Ivana Kruijff-Korbayova (DFKI, Saarbrcken)

• Alex Lascarides (Edinburgh University)

• Oliver Lemon (Herriot Watt University)

(6)

• Danielle Matthews (Sheffield)

• Gregory Mills (Edinburgh University)

• Benjamin Spector (Institut Jean Nicod, ENS)

• Aliyah Morgenstern (Universit´e Paris 3)

• Chris Potts (Stanford University)

• Laurent Pr´evot (LPL, Aix en Provence)

• Matthew Purver (Queen Mary, University of London)

• Antoine Raux (CMU)

• Hannes Rieser (Bielefeld University)

• Verena Rieser (Herriot Watt University)

• David Schlangen (Bielefeld University)

• Gabriel Skantze (KTH, Stockholm)

• Benjamin Spector (Institut Jean Nicod, Paris)

• Matthew Stone (Rutgers)

• David Traum (Institute for Creative Technologies, LA)

• Nigel Ward (UTEP)

Organizing Committee

• Jonathan Ginzburg (chair)

• Anne Abeill´e

• Margot Colinet

• Gregoire Winterstein

(7)

CONTENTS

Referential Coordination through Mental Files 1 Fran¸cois Recanati

Optimal Reasoning About Referential Expressions 2

Judith Degen and Michael Franke

Using a Bayesian Model of the Listener to Unveil

the Dialogue Information State 12

Hendrik Buschmeier and Stefan Kopp

The Pragmatics of Aesthetic Assessment in Conversation 21 Saul Albert and Patrick G.T. Healey

A Cognitive Model for Conversation 31

Nicholas Asher and Alex Lascarides

Meanings as Proposals: a New Semantic Foundation for Gricean Pragmatics 40 Matthijs Westera

We Did What We Could: An Experimental Study of Actuality Inferences

in Dialogues with Modal Verbs 50

Lori A. Moon

Children learn Language in Conversation 60

Eve V. Clark

Cues to turn boundary prediction in adults and preschoolers 61 Marisa Casillas and Michael C. Frank

French Questioning Declaratives: a Corpus Study 70

Anne Abeill´e, Benoˆıt Crabb´e, Dani`ele Godard Jean-Marie Marandin

The Use of Gesture to Communicate about Felt Experiences 80 Nicola Plant and Patrick G.T. Healey

Dialogue Acts Annotation Scheme within Arabic discussions 88 Samira Ben Dbabis and Fatma Mallek and Hatem Ghorbel

and Lamia Belguith

Declarative Design of Spoken Dialogue Systems with Probabilistic Rules 97 Pierre Lison

Communicating with Cost-based Implicature:

a Game-Theoretic Approach to Ambiguity 107

Hannah Rohde, Scott Seyfarth, Brady Clark, Gerhard Jaeger, and Stefan Kaufmann

(8)

There is no common ground in human-robot interaction 117 Geert-Jan M. Kruijff

The semantics of feedback 118

Harry Bunt

Recovering from Non-Understanding Errors in

a Conversational Dialogue System 128

Matthew Henderson and Colin Matheson and Jon Oberlander

Processing Self-Repairs in an Incremental Type-Theoretic Dialogue System 136 Julian Hough and Matthew Purver

Modelling Strategic Conversation: the STAC project 145 N. Asher, A. Lascarides, O. Lemon, M. Guhe, V. Rieser, P. Muller,

S. Afantenos, F. Benamara, L. Vieu, P. Denis, S. Paul, S. Keizer, and C. D´egremont

Toward a Mandarin-French Corpus of Interactional Data 147 Helen K.Y. Chen, Laurent Pr´evot, Roxane Bertrand, B´eatrice Priego-Valverde, and Philippe Blache

A model of intentional communication: AIRBUS

(Asymmetric Intention Recognition with Bayesian Updating of Signals) 149 J. P. de Ruiter and Chris Cummins

Spatial descriptions in discourse: choosing a perspective 151 Simon Dobnik

Modeling Referring Expressions with Bayesian Networks 153 Kotaro Funakoshi and Mikio Nakano and Takenobu Tokunaga and Ryu Iida

Helping the medicine go down:

Repair and adherence in patient-clinician dialogues 155 Christine Howes, Matt Purver, Rose McCabe, Patrick G.T. Healey,

and Mary Lavelle

A spoken dialogue interface for pedestrian city exploration:

integrating navigation, visibility, and Question-Answering 157 Srinivasan Janarthanam, Oliver Lemon, Xingkun Liu, Phil Bartie,

William Mackaness, Tiphaine Dalmas, and Jana Goetze

Influencing Reasoning in Interaction: a Model 159

Haldur ˜Oim and Mare Koit

Rhetorical Structure for Natural Language Generation in Dialogue 161 Amy Isard and Colin Matheson

Two semantical conditions for superlative quantifiers 163 Maria Spychalska

Modelling Strategic Conversation: model, annotation design and corpus 167 Stergos Afantenos, Nicholas Asher, Farah Benamara, Anais Cadilhac,

Cedric D´egremont, Pascal Denis, Markus Guhe, Simon Keizer, Alex Lascarides, Oliver Lemon, Philippe Muller, Soumya Paul,

(9)

Surprise, deception and fiction in children’s Skype conferences 169 Thomas Bliesener

A Multi-threading Extension to State-based Dialogue Management 171 Tina Kl¨uwer and Hans Uszkoreit

Negotiation for Concern Alignment in Health Counseling Dialogues 173 Yasuhiro Katagiri, Katsuya Takanashi, Masato Ishizaki,

Mika Enomoto, Yasuharu Den, Yosuke Matsusaka

Exhuming the procedural common ground: partner-specific effects 175 Gregory Mills

Opponent Modelling for Optimising Strategic Dialogue 177 Verena Rieser, Oliver Lemon, and Simon Keizer

What Should I Do Now? Supporting Progress in a Serious Game 179 Lina M. Rojas-Barahona and Claire Gardent

“The hand is not a banana”

On Developing a Robot’s Grounding Facilities 181

Julia Peltason, Hannes Rieser, Sven Wachsmuth, and Britta Wrede Quantitative experiments on prosodic and discourse units in

the Corpus of Interactional Data 183

Klim Peshkov, Laurent Pr´evot, Roxane Bertrand, St´ephane Rauzy, Philippe Blache

Towards Semantic Parsing in Dynamic Domains 185

Kyle Richardson and Jonas Kuhn

Why do we overspecify in dialogue? An experiment on L2 lexical acquisition 187 Alexandra Vorobyova and Luciana Benotti and Fr´ed´eric Landragin

(10)

Referential Coordination through Mental Files

Franc¸ois Recanati Institut Jean-Nicod Ecole Normale Superieure 29 rue d’Ulm, 75005 Paris, France

recanati@ens.fr

http://www.institutnicod.org

On the standard model, linguistic communica- tion makes it possible for the hearer to entertain the thoughts expressed by the speaker, and what makes that possible is the fact that the thoughts in question are encoded in the speakers words.

However, there are challenges both to the idea that communication results in the sharing of thoughts, and to the idea that it works by encoding the thoughts. After briefly reviewing the contextualist challenge, which targets the latter idea, I will turn to another challenge to the standard model, raised by singular thought.

What characterizes singular thoughts, and es- pecially indexical thoughts (the paradigm case), is the fact that the modes of presentation through which one thinks of objects are context-bound and perspectival. Such modes of presentation are best construed as mental files exploiting (and presup- posing) certain contextual relations to the refer- ence. This raises the communication problem, first raised by Frege: if indexical thoughts are context-bound and relation-based, how is it pos- sible to communicate them to those who are not in the same context and do not stand in the right relations to the object? Arguably, one has to give up the claim that communication involves thought sharing, in such cases.

Following Frege, I will appeal to an important distinction between linguistic and psychological modes of presentation. Psychological modes of presentation are thought ingredients, while lin- guistic modes of presentation are encoded. Psy- chological modes of presentation are perspectival and context-bound: they are mental files whose role is to store information one can gain in virtue of standing in certain contextual relations to the

to subjects who are appropriately situated vis vis the object. It follows that thoughts involving such modes of presentation are not shareable with sub- jects who are not in the right type of context. But linguistic modes of presentation are fixed by the conventions of the language and they are shared by all the language users. They are public and serve to coordinate mental files in communication by constraining them to contain the piece of infor- mation they encode. In this way communication takes place even though the indexical thoughts en- tertained by the speaker are, in some sense, pri- vate and cannot be shared by the audience. Com- munication no longer involves the replication of thoughts only their coordination.

In the last part of the talk I will apply the coor-

dination model of communication to the referen-

tial use of definite descriptions, and I will discuss

a key objection based on the distinction between

semantic reference and speakers reference.

(11)

Optimal Reasoning About Referential Expressions

Judith Degen

Dept. of Brain and Cognitive Sciences University of Rochester

jdegen@bcs.rochester.edu

Michael Franke ILLC

Universiteit van Amsterdam m.franke@uva.nl

Abstract

Theiterated best response (IBR) model is a game-theoretic approach to formal pragmatics that spells out pragmatic reasoning as back- and-forth reasoning about interlocutors’ ratio- nal choices and beliefs (Franke, 2011; J¨ager, 2011). We investigate the comprehension and production of referential expressions within this framework. Two studies manipulating the complexity of inferences involved in com- prehension (Exp. 1) and production (Exp. 2) of referential expressions show an intriguing asymmetry: comprehension performance is better than production in corresponding com- plex inference tasks, but worse on simpler ones. This is not predicted by standard formu- lations ofIBR, which makes categorical pre- dictions about rational choices. We suggest that taking into account quantitative informa- tion about beliefs of reasoners results in a bet- ter fit to the data, thus calling for a revision of the game-theoretic model.

1 Introduction

Reference to objects is pivotal in communication and a central concern of linguistic pragmatics. If interlocutors were ideal reasoners, speakers would choose the most convenient referential expression that is sufficiently discriminating given the hearer’s perspective, while hearers would choose the referent for which an observed referential expression is opti- mal given the speaker’s perspective. But it would be folly to assume that humans are ideal reasoners, so the question is: how much do interlocutors take each

other’s perspective into account when producing and interpreting referential expressions?

A lot of work has been dedicated to this is- sue. For example, computational linguists have in- vestigated efficient and natural rules for generat- ing and comprehending referential expressions (see Dale and Reiter (1995) and Golland et al. (2010) for work directly related to ours). Many empirical studies have addressed the more specific questions of whether, when and/or how, hearers take speakers’

privileged information into account (Keysar et al., 2000; Keysar et al., 2003; Hanna et al., 2003; Heller et al., 2008; Brown-Schmidt et al., 2008). Also, eye- tracking studies in the visual-world paradigm have been used to investigate howquantity reasoningin- fluences the interpretation of referential expressions (Sedivy, 2003; Grodner and Sedivy, 2011; Huang and Snedeker, 2009; Grodner et al., 2010). In recent work closely related to ours, Stiller et al. (2011) and Frank and Goodman (2012) proposed a Bayesian model of producing and comprehending referential expressions in a game setting similar to the kind we consider here. We will more closely compare these related approaches in Section 6. Despite these var- ious efforts, it is still a matter of debate whether or to what extent interlocutors routinely consider each other’s perspective.

In order to contribute to this question, we follow a recent line of experimental approaches to formal epistemology and game theory (Hedden and Zhang, 2002; Crawford and Iriberri, 2007) to investigate how muchstrategicback-and-forth reasoning speak- ers and hearers employ in abstract language games.

The tasks we investigate translate directly to the kind

(12)

of signaling games that have variously been used to account for a number of pragmatic phenomena, most notablyconversational implicatures(see, e.g., Parikh (2001), Benz and van Rooij (2007) or J¨ager (2008)). A benchmark model of idealized step-by- step reasoning, called iterated best response (IBR) model, exists for these games (Franke, 2011; J¨ager, 2011). IBR makes concrete predictions about the depth of strategic reasoning required to “solve” dif- ferent kinds of referential language games, so that by varying the difficulty of our referential tasks, it is possible to both: (i) test the predictions ofIBRmod- els of pragmatic reasoning and (ii) determine the ex- tent to which speakers and hearers reason strategi- cally about the use of referential expressions.

Our data shows that participants perform better at reasoning tasks thatIBRpredicts to involve fewer in- ference steps. This holds for comprehension and production. However, our data also shows an in- teresting asymmetry: comprehension performance is better than production in corresponding complex inference tasks, but worse on simpler ones. This is not predicted by standard formulations of IBR

which makes categorical predictions about rational choices. However, it is predicted by a more nuanced variation ofIBRthat pays attention to the quantita- tive information in the belief hierarchies postulated by the model.

Section 2 introduces signaling games as abstract models of referential language use. Section 3 out- lines the relevant notions of IBR reasoning. Sec- tions 4 & 5 describe our comprehension and pro- duction studies respectively. Section 6 discusses the results.

2 Referential Language Games

If speaker and hearer share a commonly observ- able setT of possible referents in their immediate environment, referential communication has essen- tially the structure of asignaling game: the sender S knows whicht ∈ T she wants to talk about, but the receiver Rdoes not; the speaker chooses some descriptionm; if Rcan identify the intended refer- ent, communication is successful, otherwise a fail- ure. Such a game consists of a set T (of possible referents), a set M of messages thatS could use, a prior probability distributionProver T that cap-

turesR’s prior expectation about the most likely in- tended referent, and a utility function that captures the players’ preferences in the game. We assume thatSandRare both interested in establishing refer- ence, so that iftis the intended referent andt0isR’s guess, then for some constantss > f:U(t, t0) =sif t=t0andfotherwise. Additionally, if messages are meaningful, this is expressed by a denotation func- tion[[m]]⊆T that gives the set of referents to which mis applicable (e.g., of which it is true).

Consider, e.g., the situations depicted in Fig. 1.

There are three possible referents T = {tt, tc, td} in the form of monsters and robots wearing one ac- cessory each that both S and R observe. Since there is no reason to prefer any referent over an- other, we assume thatPr is a flat distribution over T. There are also four possible messages M = {mt, mc, md1, md2} with some intuitively obvious

“semantic meaning”. For example, the messagemc

for red hat would intuitively be applicable to ei- ther the robot tt or the green monster tc, so that [[mc]] ={tt, tc}.

Signaling games like those in Fig. 1 are the basis for the critical conditions of our experiments (see also Sections 4 and 5), where we test which refer- ent subjects choose for a giventrigger messageand which message they choose for a trigger referent.

Trigger items for comprehension and production ex- periments are marked with an asterisk in Fig. 1. In- dicest, c, dstand fortarget,competitoranddistrac- torrespectively.

We refer to a game as in Fig. 1(a) as thesimple implicature condition, because it involves a simple scalar implicature. Hearingtrigger messagemc,R should reason thatSmust have meanttarget statett, and notcompetitor statetc, because ifShad wanted to refer to the latter she could have used an unam- biguous message. Conversely, whenSwants to re- fer totrigger statetc, she should not use the true but semantically ambiguous message mc, because she has a stronger messagemt. Similarly, we refer to a game in Fig. 1(b) as thecomplex implicature condi- tion, because it requires performing scalar reasoning twice in sequence (see Fig. 2 later on).

(13)

tt tc td

Possible Referents

mt mc

md1 md2

Message Options

(a) simple

tt tc td

Possible Referents

mt mc

md1 md2

Message Options

(b) complex

Figure 1: Target implicature conditions. Hearers choose one of the POSSIBLEREFERENTST ={tt,tc,td}. Speakers have MESSAGEOPTIONSM={mt,mc,md1,md2}. Trigger items are indicated with asterisks: e.g.,ttis the referent to be communicated on complex production trials.

R0

mt

mc md1

md2

S1

tt

tc td

R2

mt

mc md1

md2

tt

tc td

S0

tt

tc

td

R1

mt

mc

md1

md2

S2

tt

tc

td

mt

mc

md1

md2

(a) simple

R0

mt mc

md1

md2

S1

tt tc

td

R2

mt mc

md1

md2

S3

tt tc

td

mt mc

md1

md2

S0

tt

tc

td

R1

mt

mc

md1

md2

S2

tt

tc

td

R3

mt

mc

md1

md2

tt

tc

td

(b) complex

Figure 2: Qualitative predictions of theIBRmodel for simple and complex conditions. The graphs give the set of best responses at each level of strategic reasoning as a mapping from the left to the right.

(14)

3

IBR

Reasoning

TheIBR model defines two independent strands of strategic reasoning about language use: one that starts with a na¨ıve (level-0) receiverR0and one that starts with a na¨ıve senderS0 (Franke, 2011; J¨ager, 2011). If utilities are as indicated and priors are flat, the behavior of level-0 players is predicted to be a uniform choice over options that conform to the se- mantic meaning of messages: R0(m) = [[m]] and S0(t) ={m | t∈[[m]]}. Sophisticated player types of level k+ 1 play any rational choice with equal probability given a belief that the opponent player is of level k. For our experimental examples, the

“light” system of Franke (2011) applies, where so- phisticated types are defined as:1

Sk+1(t) =





arg minm∈R−1

k (t)|Rk(m)| ifR−1k (t)6=∅ S0(t) otherwise

Rk+1(m) =





arg mint∈S−1

k (m)|Sk(t)| ifSk−1(m)6=∅ R0(m) otherwise

The sequences of best responses for the simple and complex games from Fig. 1 are given in Fig. 2. On this purely qualitative picture, theIBRmodel makes the same predictions for comprehension and pro- duction. In the simple condition, the trigger item is mapped to either target or competitor with equal chance by na¨ıve players; all higher level types map the trigger item to the target item with probability one. In the complex condition, the trigger items are mapped to target and competitor in levels 0 and 1 with equal probability, but uniquely to the target item fork≥2.

The sequences in Fig. 2 only consider the actual best responses ofSandR, but not the more nuanced quantitative information that gives rise to these. Best responses are defined as those that maximize ex- pected utility given what the players believe about how likely each choice option would lead to com- municative success. The relevant expected success probabilities are given in Table 1 for sophisticated

1HereR−1k (t) ={m |t∈Rk(m)}. Likewise forSk−1.

types. (Na¨ıve types have no or only trivial beliefs about the game.)

For reasons of space suffice it to give the intu- ition behind these numbers. E.g., in the simple con- dition R1 believes that the trigger message is used by na¨ıve senders who want to refer tottortc. But na¨ıve senders who want to refer totcwould also use mtwith probability1/2. So, by Bayesian condition- alization, after hearingmc,R1believes the intended referent isttwith probability2/3.

Notice that whileR’s success expectations always sum to one (there is always only exactly one in- tended referent), S’s success expectations need not (several messages could be believed to lead to suc- cessful communication). A further difference con- cerns whenSandRare sure of communicative suc- cess. In the simple condition,S1 is already sure of success, but onlyR≥2is. In the complex condition, R2 is already sure of success, but onlyS≥3 is. So, if we assume that human reasoners aim for certainty of communicative success in pragmatic reasoning, the simple condition is less demanding in produc- tion than in comprehension, while for the complex condition the reverse is the case.

4 Experiment 1

Exp. 1 tested participants’ behavior in a compre- hensiontask that used instantiations of the signaling games described in Section 2.

4.1 Methods

Participants. Using Amazon’s Mechanical Turk, 30 workers were paid $0.60 to participate. All were na¨ıve as to the purpose of the experiment and partic- ipants’ IP address was limited to US addresses only.

Two participants did the experiment twice. Their second run was excluded.

Procedure and Materials. Participants engaged in a referential comprehension task. On each trial they saw three objects on a display. Each object differed systematically along two dimensions: its ontologi- cal kind (robot or one of two monster species) and accessory (scarf or either blue or red hat). In addi- tion to these three objects, participants saw a picto- rial message that they were told was sent to them by a previous participant whose job it was to get them to pick out one of these three objects. They

(15)

simple complex

level R S R S

1 h2/3,1/3,0i h1,1/2,0,0i h1/2,1/2,0i h1/2,1/2,0,1/3i 2 h1,0,0i h1,0,0,0i h1,0,0i h1/2,0,0,1/3i 3 h1,0,0i h1,0,0,0i h1,0,0i h1,0,0,1/3i

Table 1: Success expectations for the trigger items in the simple and complex condition. Success expectations forR are given in order fortt,tcandtd, those forSin order formt,mc,md1andmd2.

were told that the previous participant was allowed to send a message expressing only one feature of a given object, and that the messages the participant could send were furthermore restricted to monsters and hats. The four expressible features were visible to participants at the bottom of the display on every trial.

Participants initially played four sender trials.

They saw three objects, one of which was high- lighted with a yellow rectangle, and were asked to click on one of four pictorial messages to send to another Mechanical Turk worker to get them to pick out the highlighted object. They were told that the other worker did not know which object was high- lighted but knew which messages could be sent. The four sender trials contained three unambiguous and one ambiguous trial which functioned as fillers in the main experiment.

Participants saw 36 experimental trials, with a 2:1 ratio of fillers to critical trials. Of the 12 critical tri- als, 6 constituted a simple implicature situation and 6 a complex one as defined in Section 2 (see also Fig. 1).

Target position was counterbalanced (each criti- cal trial occurred equally often in each of the 6 pos- sible orders of target, competitor, and distractor), as were the target’s features and the number of times each message was sent. Of the 24 filler trials, half used the displays from the implicature conditions but the target was eithertc or td (as identified un- ambiguously by the trigger message). This was also intended to prevent learning associations of display type with the target. On the other 12 filler trials, the target was either entirely unambiguous or en- tirely ambiguous given the message. That is, there was either only one object with the feature denoted by the trigger message, or there were two identical objects that were equally viable target candidates.

Trial order was pseudo-randomized such that there

were two lists (reverse order) of three blocks, where critical trials and fillers were distributed evenly over blocks. Each list began with three filler trials.

4.2 Results and Discussion

Proportions of choice types are displayed in Fig. 3(a). As expected, participants were close to ceiling in choosing the target on unambiguous filler trials but at chance on ambiguous ones. This con- firms that participants understood the task. On criti- cal implicature trials, participants’ performance was intermediate between ambiguous and unambiguous filler trials. On simple implicature trials, participants chose the target 79% of the time and the competitor 21% of the time. On complex implicature trials, the target was chosen less often (54% of the time).

To test whether the observed differences in tar- get choices above were significantly different, we fitted a logistic mixed-effects regression to the data.

Trials on which the distractor was selected were ex- cluded to allow for a binary outcome variable (target vs. no target choice). This led to an exclusion of 5%

of the data. The model predicted the log odds of choosing a target over a competitor from a Helmert- coded CONDITIONpredictor, a predictor coding the TRIAL number to account for learning effects, and their interaction. Three Helmert contrasts over the four relevant critical and filler conditions were in- cluded in the model, comparing each condition with a relatively less skewed distribution against the more skewed distributions (in order: ambiguous fillers, complex implicatures, simple implicatures, unam- biguous fillers). This allowed us to capture whether the differences in distributions for neighboring con- ditions suggested by Fig. 3(a) were significant. We included the maximal random effect structure that allowed the model to converge:2 by-participant ran-

2For the procedure that was used to generate the random effect structure, seehttp://hlplab.wordpress.com/

(16)

Coefβ SE(β) z p (INTERCEPT) 1.81 0.22 8.3 <.0001 AMBIG.VS.REST −2.56 0.45 −5.6 <.0001 COMPLEX.VS.EASIER −3.20 0.53 −6.0 <.0001 SIMPLE.VS.UNAMBIG −2.68 0.81 −3.3 <.001

TRIAL 0.00 0.01 0.3 0.8

TRIAL:AMBIG.VS.REST −0.07 0.03 −2.6 <.05 TRIAL:COMPLEX.VS.EASIER −0.01 0.03 −0.4 0.7 TRIAL:SIMPLE.VS.UNAMBIG 0.08 0.05 1.7 0.08

Table 2: Model output of Exp. 1. AMBIG.VS.REST, COMPLEX.VS.EASIER, and SIMPLE.VS.UNAMBIG are the Helmert-coded condition contrast predictors, in order.

dom slopes for CONDITIONand TRIALand by-item random intercepts. Results are given in Table 2.

All Helmert contrasts reached significance atp <

.001. That is, all target/competitor distributions shown in Fig. 3(a) are different from each other.

There was no main effect of TRIAL, indicating that no learning took place overall during the course of the experiment. However, there were significant in- teractions, suggesting selective learning in a subset of conditions. In particular there was a significant interaction between TRIALand the Helmert contrast coding the difference between ambiguous fillers and the rest of the conditions (AMBIG.VS.REST, β =

−.05, SE = .02, p < .05) and a marginally sig- nificant interaction between TRIALand the Helmert contrast coding the difference between the sim- ple implicature and unambiguous filler condition (SIMPLE.VS.UNAMBIG,β = .08,SE = .05,p = .08). Further probing the simple effects revealed that participants chose the target more frequently later in the experiment in the simple and complex condition.

This was evidenced by a main effect of TRIAL on that subset of the data (β=.03,SE=.01,p < .05) but no interactions with condition. There were no learning effects in the ambiguous and unambiguous filler conditions; participants were at chance for am- biguous items and at ceiling for unambiguous items throughout. This suggests that at least some partici- pants became aware that there was an optimal strat- egy and began to employ it as the experiment pro- gressed.

We next address the question of whether the data supports the within-participant distributions pre- dicted by standardIBR. Recall from Section 2 that 2009/05/14/random-effect-structure/

for the simple condition,IBRpredictsR0players to have a uniform distribution over target and competi- tor choices andR≥1 players to choose only the tar- get. For the complex condition, the uniform distribu- tion is predicted for bothR0 andR1 players, while only target choices are expected forR≥2players.

This is not borne out (see Fig. 4(a)). On the one hand, there were 3 participants in the simple condi- tion and 5 in the complex condition who chose the target on half of the trials and could thus be classified asR0 (orR1 in the complex condition). Similarly, there were 11 participants in the simple condition and one in the complex condition who chose only targets and thus behaved as sophisticated receivers according toIBR. On the other hand, the majority of participants’ distributions over target and competi- tor choices deviated from both the uniform and the target-only distribution.

One possibility is that some participants’ type shifted from Rk to Rk+1 as the experiment pro- gressed. That is, they may have shifted from ini- tially choosing targets and competitors at random to choosing only targets. However, while it is the case that overall more targets were chosen later in the ex- periment in both implicature conditions, there was nevertheless within-participant variation in choices late in the experiment inconsistent with a categori- cal shift. Another possibility is that the experiment was too short to observe this categorical shift.

5 Experiment 2

Exp. 2 tested participants’ behavior in aproduction task that used instantiations of the signaling games described in Section 2.

(17)

0.0 0.2 0.4 0.6 0.8 1.0

ambiguous filler comple

x implicature

simple implicatureunambiguous filler

Proportion of choices

(a) Experiment 1

0.0 0.2 0.4 0.6 0.8 1.0

ambiguous filler comple

x implicature

simple implicatureunambiguous filler

Response target distractors competitor

(b) Experiment 2

Figure 3: Proportions of target, competitor, and distractor choices in implicature and filler conditions (Exps. 1 & 2).

0 5 10 15 20

0 1 2 3 4 5 6

Number of target responses

Number of subjects

Implicature simple complex

(a) Experiment 1

0 5 10 15 20

1 2 3 4 5 6

Number of target responses Implicature

simple complex

(b) Experiment 2

Figure 4: Distribution of participants over number of target choices in implicature conditions (Exp. 1 & 2).

5.1 Methods

Participants. Using Amazon’s Mechanical Turk, 30 workers were paid $0.60 to participate under the same conditions as in Exp. 1. Data from two partici- pants whose comments indicated that not all images displayed properly were excluded.

Procedure and Materials. The procedure was the same as on the sender trials in Exp. 1. Participants saw 36 trials with a 2:1 ratio of fillers to critical tri- als. There were 12 critical trials (6 simple and 6 complex implicature situations as in Fig. 1). Half of the fillers used the same displays as the impli- cature trials, but one of the other two objects was highlighted. This meant that the target message was either unambiguous (e.g. when the highlighted ob- ject wastt in Fig. 1(a) the target message wasmc) or entirely ambiguous. The remaining 12 filler trials employed other displays with either entirely unam- biguous or ambiguous target messages. Two exper-

imental lists were created and counterbalancing was ensured as in Exp. 1.

5.2 Results and Discussion

Proportions of choice types are displayed in Fig. 3(b). As in Exp. 1, participants were close to ceiling for target message choices on unambiguous filler trials but at chance on ambiguous ones. On critical implicature trials, participants’ performance was slightly different than in Exp. 1. Most notably, the distribution over target and competitor choices in the simple implicature condition was more skewed than in Exp. 1 (95% targets, 5% competitors), while it was more uniform than in Exp. 1 on complex im- plicature trials (50% targets, 47% competitors).

We again fitted a logistic mixed-effects regres- sion model to the data. Trials on which the distrac- tor messages were selected were excluded to allow for a binary outcome variable (target vs. competi-

(18)

tor choice). This led to an exclusion of 2% of tri- als. In addition, the unambiguous filler condition is not included in the analysis reported here since there was only 1 non-target choice after exclusion of distractor choices, leading to unreliable model convergence. Thus, as in Exp. 1, CONDITIONwas entered into the model as a Helmert-coded variable but with only two contrasts, one comparing the sim- ple implicature condition to the mean of ambigu- ous fillers and the complex implicature condition (SIMPLE.VS.HARDER), and another one comparing the ambiguous fillers with the complex implicatures (AMBIG.VS.COMPLEX). The model reported here further does not contain a TRIAL predictor to con- trol for learning effects because model comparison revealed that it was not justified (χ2(1) = 0.06, p=.8). That is, there were no measurable learning effects in this experiment. We included the maximal random effects structure that allowed the model to converge: by-participant random slopes for CONDI-

TIONand by-item random intercepts.

The SIMPLE.VS.HARDER Helmert contrast reached significance (β = 3.04, SE = 0.5, p < .0001) while AMBIG.VS.COMPLEX did not (β = 0.08, SE = 0.41, p = .9). That is, there was no difference between choosing a target in the ambiguous filler condition and in the complex implicature condition, suggesting that participants were at chance in deriving complex implicatures in production. However, they were close to ceiling in choosing targets in the simple implicature condition.

The observed within-participant distributions are better predicted by the qualitative version of IBR

than in Exp. 1 (see Fig. 4(b)). For the simple condi- tion,IBRpredictsS0 players to have a uniform dis- tribution over target and competitor choices andS≥1 players to choose only the target. For the complex condition, the uniform distribution is predicted for both S0 and S1 players, while only target choices are expected forS≥2players.

In the simple implicature condition, 75% of par- ticipants were perfectS1 reasoners. The remaining 25% chose almost only targets. That is, participants very consistently computed the implicature. In con- trast, the bulk of participants chose targets versus competitors at random in the complex implicature condition. Only 2 participants chose the target 5 out of 6 times.

Comparing these results to the results from Exp. 1, we see the following pattern: in produc- tion the simple one-level implicatures are more read- ily computed than in comprehension, while the more complex two-level implicatures are more read- ily computed in comprehension than in production.

That is, rather than comprehension mirroring pro- duction, in this paradigm there is an asymmetry be- tween the two. This is consistent with the quanti- tative interpretation of IBR(as described in section 3) that takes into account players’ uncertainty about communicative success.

6 General Discussion

In two studies using an abstract language game we investigated speakers’ and hearers’ strategic reason- ing about referential descriptions. Most generally, our results clearly favor step-wise solution concepts like IBR over equilibrium-based solution concepts (e.g. Parikh (2001)) as predictors of participants’

pragmatic reasoning: our results suggest that inter- locutors do take perspective and simulate each oth- ers’ beliefs, although (a) message and interpreta- tion choice behavior is not always optimal and (b) perspective-taking decreases as the number of rea- soning steps required to arrive at the optimal re- sponse, as predicted byIBR, increases.

We also found evidence for an intriguing asym- metry between production and comprehension.

While not predicted by the standard formulation of the IBR model, this asymmetry is consistent with an interpretation of IBRthat takes into account the uncertainty that interlocutors have about the prob- ability of communicative success given a restricted set of message and interpretation options. This calls for a revision of theIBRmodel to incorporate more nuanced quantitative information. Since, moreover, there is a substantial amount of individual varia- tion, further investigating the role of individual dif- ferences on perspective-taking (e.g. Brown-Schmidt (2009)) promises to be a fruitful avenue of further research that could inform model revisions.

It could be objected that the comparison of im- plicatures across experiments may be problematic due to the different nature of the tasks involved in the production vs. comprehension experiments and differences underlying the involved inference pro-

(19)

cesses. However, note that the version of the IBR

model that takes into account interlocutor uncer- tainty predicts the asymmetry between production and comprehension that we found precisely by in- tegrating some of the differences involved in the two processes: most importantly, since conversa- tion is modelled as a dynamic game, the sender rea- sons about the future behavior of the receiver, while the receiver reasons “backward”, so to speak, using Bayesian conditionalization, about the most likely initial state the sender could have been in; this gives rise, as we have seen, to different predictions about when a speaker or a hearer can be absolutely certain of communicative success. How this difference is implemented mechanistically is an interesting ques- tion that merits further investigation.

Frank and Goodman (2012) report the results of an experiment using a referential game almost iden- tical to ours and show that a particular Bayesian choice model very reliably predicts the observed data for both comprehension and production. In fact, the proposed Bayesian model is a variant of

IBR reasoning that considers only a level-1 sender and a level-2 receiver, but assumes smoothed best response functions at each optimization step. In a smoothedIBR model, players’ choices are stochas- tic with choice probabilities proportional to expected utilities (see Rogers et al. (2009) for a general for- mulation of such a model in game theoretic terms).

This suggests a straightforward agenda for future work: combining our approach and that of Frank and Goodman (2012), smoothedIBRmodels that al- low various strategic types for speakers and listeners should be further tested on empirical data.

In related work investigating comprehenders’ ca- pacity for deriving ad hoc scalar implicatures, Stiller et al. (2011) found that subjects could draw simple implicatures of the type we report above in a setup very similar to ours, but failed to draw complex ones. In contrast, our comprehenders performed above chance in the complex condition (albeit only slightly so). One possible explanation for this differ- ence is that unlike Stiller et al. (2011), we restricted the set of message alternatives and also made it ex- plicit to participants that a message could only de- note one feature. This highlights the importance of (mutual knowledge of) the set of alternatives as- sumed by interlocutors in a particular communica-

tive setting. While we restricted this set explicitly, in natural dialogue there is likely a variety of factors that determine what constitutes an alternative.

This suggests that future extensions of this work should move towards an artificial language paradigm. For example, whether a given message constitutes an alternative is likely to be affected by message complexity, which was held constant in our setup by using pictorial messages. Artificial lan- guage paradigms allow for investigating the effect of message complexity on inferences of the type re- ported here. Similarly, it will be important to fur- ther test the quantitative predictions made by IBR, e.g. by parametrically varying the payoff of commu- nicative success and failuresandfand the interac- tion thereof with message complexity.

One question that arises in connection with the restrictions we imposed on the set of available pic- torial messages, is the extent to which our results are transferable to natural language use. This is a legitimate concern that we would have to address empirically in future work. But notice also that, firstly, there is no a priori reason to believe that reasoning about natural language use and reasoning about our abstract referential games should neces- sarily differ — indeed it has been noted as early as Grice (1975) that conversational exchanges consti- tute but one case of rational communicative behav- ior. More importantly, even if reasoning about nat- ural languagewere different in kind from strategic reasoning in general, the kind of strategic IBRrea- soning we address here is a specific variety of rea- soning that has been explicitly proposed in the lit- erature as a model of pragmatic reasoning. The re- ported experiments are thus relevant in at least as far as they are the first empirical test of whether hu- man reasoners are, in general, able to performthis kind of strategic reasoning in a task that translates the proposed pragmatic context models as directly as possible into an experimental setting.

We conclude that the studies reported are an encouraging first step towards validating game- theoretic approaches to formal pragmatics, which are well-suited to modeling pragmatic phenom- ena and generating quantitative, testable predictions about language use. The future challenge, as we see it, lies in fine-tuning the formal models alongside further careful empirical investigation.

(20)

Acknowledgements

We thank Gerhard J¨ager, T. Florian Jaeger, and Michael K. Tanenhaus for fruitful discussion. This work was partially supported by a EURO-XPRAG grant awarded to the authors and NIH grant HD- 27206 to Michael K. Tanenhaus.

References

Anton Benz and Robert van Rooij. 2007. Optimal asser- tions and what they implicate.Topoi, 26:63–78.

Sarah Brown-Schmidt, Christine Gunlogson, and Michael K. Tanenhaus. 2008. Addressees distinguish shared from private information when interpreting questions during interactive conversation. Cognition, 107:1122–1134.

Sarah Brown-Schmidt. 2009. The role of executive function in perspective taking during online language comprehension. Psychonomic Bulletin and Review, 16(5):893 – 900.

Vincent P. Crawford and Nagore Iriberri. 2007. Fatal at- traction: Salience, na¨ıvet´e, and sophistication in exper- imental ”hide-and-seek” games. The American Eco- nomic Review, 97(5):1731–1750.

Robert Dale and Ehud Reiter. 1995. Computational interpretations of the gricean maxims in the gener- ation of referring expressions. Cognitive Science, 19(2):233–263.

Michael C. Frank and Noah D. Goodman. 2012. Predict- ing pragmatic reasoning in language games. Science, 336(6084):998.

Michael Franke. 2011. Quantity implicatures, exhaus- tive interpretation, and rational conversation. Seman- tics & Pragmatics, 4(1):1–82.

Dave Golland, Percy Liang, and Dan Klein. 2010.

A game-theoretic approach to generating spatial de- scriptions. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Process- ing, pages 410–419, Cambridge, MA. Association for Computational Linguistics.

H.P. Grice. 1975. Logic and conversation. Syntax and Semantics, 3:41–58.

Daniel Grodner and Julie C. Sedivy. 2011. The effect of speaker-specific information on pragmatic inferences.

In N. Pearlmutter and E. Gibson, editors,The Process- ing and Acquisition of Reference. MIT Press, Cam- bridge, MA.

Daniel Grodner, Natalie M. Klein, Kathleen M. Carbary, and Michael K. Tanenhaus. 2010. ”Some”, and possi- bly all, scalar inferences are not delayed: Evidence for immediate pragmatic enrichment. Cognition, 116:42 – 55.

Joy Hanna, Michael K. Tanenhaus, and John C.

Trueswell. 2003. The effects of common ground and perspective on domains of referential interpretation.

Journal of Memory and Language, 49:43–61.

Trey Hedden and Jun Zhang. 2002. What do you think i think you think?: Strategic reasoning in matrix games.

Cognition, 85(1):1–36.

Daphna Heller, Daniel Grodner, and Michael K. Tanen- haus. 2008. The role of perspective in identifying do- mains of reference.Cognition, 108:831–836.

Y. Huang and Jesse Snedeker. 2009. On-line interpre- tation of scalar quantifiers: Insight into the semantics- pragmatics interface. Cognitive Psychology, 58:376–

415.

Gerhard J¨ager. 2008. Applications of game theory in linguistics. Language and Linguistics Compass, 2/3:406–421.

Gerhard J¨ager. 2011. Game-theoretical pragmatics. In Johan van Benthem and Alice ter Meulen, editors, Handbook of Logic and Language, pages 467–491. El- sevier, Amsterdam.

Boaz Keysar, Dale J. Barr, and J. S. Brauner. 2000. Tak- ing perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science, 11:32–37.

Boaz Keysar, S. Lin, and Dale J. Barr. 2003. Limits on theory of mind use in adults. Cognition, 89:25–41.

Prashant Parikh. 2001. The Use of Language. CSLI Publications, Stanford University.

Brian W. Rogers, Thomas R. Palfrey, and Colin Camerer.

2009. Heterogeneous quantal response equilibrium and cognitive hierarchies. Journal of Economic The- ory, 144(4):1440–1467.

Julie C. Sedivy. 2003. Pragmatic versus form-based ac- counts of referential contrast: Evidence for effects of informativity expectations. Journal of Psycholinguis- tic Research, 32:3–23.

Alex Stiller, Noah D. Goodman, and Michael C. Frank.

2011. Ad-hoc scalar implicature in adults and chil- dren. InProceedings of the 33rdAnnual Conference of the Cognitive Science Society.

(21)

Using a Bayesian Model of the Listener to Unveil the Dialogue Information State

Hendrik Buschmeier and Stefan Kopp

Sociable Agents Group – CITEC and Faculty of Technology, Bielefeld University PO-Box 10 01 31, 33501 Bielefeld, Germany

{hbuschme, skopp}@uni-bielefeld.de

Abstract

Communicative listener feedback is a pre- valent coordination mechanism in dialogue.

Listeners use feedback to provide evidence of understanding to speakers, who, in turn, use it to reason about the listeners’ mental state of listening, determine the grounded- ness of communicated information, and ad- apt their subsequent utterances to the listen- ers’ needs. We describe a speaker-centric Bayesian model of listeners and their feed- back behaviour, which can interpret the listener’s feedback signal in its dialogue con- text and reason about the listener’s mental state as well as the grounding status of ob- jects in information state.

1 Introduction

In dialogue, the interlocutor not currently holding a turn, is usually not truly passive when listen- ing to what the turn-holding interlocutor is saying.

Quite the contrary, ‘listeners’ actively participate in the dialogue. They do so by providing commu- nicative feedback, which, among other signals, is evidence of their perception, understanding and acceptance of and agreement to the speakers’ ut- terances. ‘Speakers’ use this evidence to reason about common ground and to design their utter- ances to accommodate the listener’s needs. This interplay makes communicative listener feedback an important mechanism for dialogue coordination and critical to dialogue success.

From a theoretical perspective, however, the in- terpretation of communicative feedback is a diffi- cult problem. Feedback signals are only conven- tionalised to a certain degree (meaning and use might vary with the individual listener) and, as Allwood et al. (1992) argue, they are highly sensit-

utterances – and the communicative situation in general.

We present a Bayesian network model for inter- preting a listener’s feedback signals in their dia- logue context. Taking a speaker-centric perspect- ive, the model keeps representations of the men- tal ‘state of listening’ attributed to the listener in the form of belief states over random variables, as well as an estimation of groundedness of the information in the speaker’s utterance. To reason about these representations, the model relates the listener’s feedback signal to the speaker’s utter- ance and his expectations of the listener’s reaction to it.

2 Background and related work

Feedback signals, verbal-vocal or non-verbal, are communicative acts

1

that bear meaning and serve communicative functions. Allwood et al. (1992, p. 3) identified four

basic

communicative functions of feedback, namely

contact

(being “willing and able to continue the interaction”),

perception

(be- ing “willing and able to perceive the message”),

un- derstanding

(being “willing and able to understand the message”), and

attitudinal reactions

(being

“willing and able to react and (adequately) respond to the message”). It is also argued that these func- tions form a hierarchy such that higher functions encompass lower ones (e.g., communicating under- standing implies perception, which implies being in contact). Kopp et al. (2008) extended this set of basic functions by adding

acceptance/agreement

(previously considered an attitudinal reaction) and

1Note, however, that listeners might not be (fully) aware of some of the feedback they are producing. Not all should be considered as necessarily having communicative intent (Allwood et al., 1992). Nevertheless, even such ‘indicated’

feedback is communicative and is often interpreted by inter-

(22)

by regarding expressions of emotion as attitudinal reactions

Feedback signals can likely take an infinite num- ber of forms. Although verbal-vocal feedback sig- nals, as one example, are taken from a rather small repertoire of lexical items such as ‘yes’, ‘no’, as well as non-lexical vocalisations such as ‘uh-huh’,

‘huh’, ‘oh’, ‘mm’, many variations can be produced spontaneously through generative processes such as by combination of different vocalisations or re- peating syllables (Ward, 2006). In addition, these verbalisations can be subject to significant pros- odic variation. Naturally, this continuous space of possible feedback signals can express much more than the basic functions described above.

And listeners make use of these possibilities to ex- press subtle differences in meaning (Ehlich, 1986) – which speakers are able to recognise, interpret (Stocksmeier et al., 2007; Pammi, 2011) and react to (Clark and Krych, 2004).

For a computational model of feedback produc- tion, Kopp et al. (2008) proposed a simple concept termed ‘listener state.’ It represents a listener’s current mental state of contact, perception, under- standing, acceptance and agreement as simple nu- merical values. The fundamental idea of this model is that the communicative function of a feedback signal encodes the listener’s current mental state.

An appropriate expression of this function can be retrieved by mapping the listener state onto the continuous space of feedback signals.

In previous work (Buschmeier and Kopp, 2011), we adopted the concept of listener state as a repres- entation of a mental state that speakers in dialogue

attribute

to listeners through Theory of Mind. That is, we made it the result of a feedback interpret- ation process. We argued that such an ‘attributed listener state’ (

ALS

) is an important prerequisite to designing utterances to the immediate needs a listener communicates through feedback. The

ALS

captures such needs in an abstract form (e.g., is there a difficulty in perception or understanding) by describing them with a small number of vari- ables, and is in this way similar to the “one-bit, most minimal partner model” which Galati and Brennan (2010, p. 47) propose as a representation suitable for guiding general audience design pro- cesses in dialogue.

For more specific adaptations, a speaker needs to consider more detailed information, such as

1996). Knowing whether previously conveyed in- formation can be assumed to be part of the com- mon ground (or even its degree of groundedness [Roque and Traum, 2008]) is important in order to estimate the success of a contribution (and initiate a repair if necessary) and to produce subsequent ut- terances that meet a listener’s informational needs.

Analysing an inherently vague phenomenon such as feedback signals in their dialogue context is almost only possible in a probabilistic frame- work. It is difficult to draw clear-cut conclusions from listener feedback and even human annotators, not being directly involved in the interaction, have difficulties consistently annotating feedback sig- nals in terms of conversational functions (Geertzen et al., 2008).

A probabilistic framework well suited for reas- oning about knowledge in an uncertain world is that offered by Bayesian networks. They represent knowledge in terms of ‘degrees of belief’, meaning that they do not hold one definite belief about the current state of the world, but represent different possible world states along with their probabilit- ies of being true. Furthermore, Bayesian networks make it possible to model the relevant influences between random variables representing different aspects of the world in a compact model. This is why they are potentially well suited for reasoning about feedback use in dialogue. Using a Bayesian network, the conditioning influences between dia- logue context, listener feedback,

ALS

, as well as the estimated grounding status of speaker’s utter- ances can be captured in a unified and well-defined probabilistic framework.

Representing grounding status not only in de- grees of groundedness but also in terms of de- grees of belief, adds a new dimension to the ap- proach put forth by Roque and Traum (2008). Deal- ing with uncertainty in the representation of com- mon ground simplifies the interface to vague in- formation gained from listener feedback, and re- moves the need to prematurely commit to a specific grounding level. This keeps the information status of an utterance open to change.

Bayesian networks have already been used

to model problems similar to the one in ques-

tion. Paek and Horvitz (2000), for example, use

Bayesian networks to manage the uncertainties,

among other things, in the model of grounding

behaviour in the ‘Quartet’ architecture for spoken

(23)

other hand created a Bayesian network model of dialogue system users’ grounding behaviour. There the Bayesian network simulates consistent user behaviour which can be used for experimenta- tion with, and training of, dialogue management policies. Finally, Stone and Lascarides (2010) pro- pose to combine Bayesian networks with the logic based Segmented Discourse Representation The- ory (

SDRT

; Asher and Lascarides, 2010) for a the- ory of grounding in dialogue that is both rational (in the utility theoretic sense) and coherent (by assigning discourse relations a prominent role in making sense of utterances).

3 A Bayesian model of the listener

A speaker’s Bayesian model of a listener should relate dialogue context, listener feedback, the at- tributed listener state as well as the grounding status of the speaker’s utterances to each other.

Constructing such a model either needs corpora with fine-grained annotations of all these aspects of dialogue (to ‘learn’ it from data) or detailed knowledge about the relations (to design it). Apart from the fact that adequate corpora are practic- ally non-existent, structure-learning of a Bayesian network can only infer conditional independence between variables and not their underlying causal relations. The top-ranking results of a structure learning algorithm might therefore differ substan- tially, resulting in networks that disagree about influences and causal relationships (Barber, 2012).

For this reason, we take the approach of construct- ing a Bayesian network by ‘hand’, making – as is not uncommon in cognitive modelling – informed decisions based on research findings and intuition.

3.1 Assumed causal structure

When analysing or modelling a phenomenon with Bayesian networks, it is helpful to think of them as representing the phenomenon’s underlying causal structure (Pearl, 2009). Network nodes represent causes, effects or both, and directed edges between nodes represent causality. A directed edge from a node

A

to a node

B, for example, models thatA

is a cause for

B, and thatB

is an effect of

A. Another

directed edge from

B

to a third node

C, makesB

the cause of

C. Being intermediate, it is possible

that

B

is both an effect (of

A) and a cause (ofC).

Figure 1 illustrates the causal structure of listener feedback in verbal interaction that we as-

S L

IS

Utterance Expec-

tations

ALS Mental

state Situation

Feedback IS

Figure 1: SpeakerSreasoning about the mental state of listener L.S’s utterances causeLto move into a certain state of understanding. This influencesL’s feedback signals, which are evidence forS’s attributed listener state ofL.

an utterance in the presence of a listener

L

and wants to know what

L’s mental state of listening is

towards her utterance, i.e., whether

L

is in contact, has perceived, understood and accepts or agrees with

S’s utterance. As it is impossible forS

to dir- ectly observe

L’s mental state, she can only try to

reconstruct it based on

L’s communicative actions

(i.e.,

L’s feedback) and by relating it to the dia-

logue context: her utterance, her expectations and the communicative situation.

To make a causally coherent argument, we as- sume, for the moment, that

L’s unobservable men-

tal state is part of the Bayesian listener model (parts unobservable to

S

are drawn with grey dashed lines in Figure 1).

L’s mental state results

from the effect of

S’s utterance, the communicative

situation as well as

L’s information state.L’s men-

tal state, on the other hand, causes him to provide evidence of his understanding by producing a feed- back signal. In this way closure is achieved for the causal chain from utterance, via mental state and feedback signal, to

S’s reconstruction ‘ALS

’ of

L’s

mental state.

This causally coherent model can easily be re- duced to an agent-centric model for

S, which con-

sists of only those influences that

S

can observe directly (drawn with black solid lines in Figure 1).

Although this leads to a ‘gap’ in the causal chain, nodes retain their roles as causes and/or effects.

It should be noted, however, that the causal

model only provides the scaffolding of a more

Références

Documents relatifs

The model is composed by two modules: the Listener Intent Planner module that triggers backchannel sig- nals according to the user’s visual and acoustic be- haviour and the

Sensor A has an expected frequency in the intermediate range such as all the sensors in green whose trajectories belong to the coverage region. On the contrary, sensor B has a

The values of protrusion, opening and stretching given in Table 1 are only rough estimations of these parameters for isolated vowels or consonants in neutral contexts.. Of

For example, the meetings are too few to promote trust and cultural interoperability, especially in an environment where politics and lobbying for different agendas (other

Statistical results on the current distribution and the maximum current as a function of the density of defects in the network are compared with theoretical

We previously presented a debugger for Parallel DEVS [14], which supports three types of steps: (1) a big step executes one iteration of the simulation algorithm, (2) a small

Our idea is that, even if the whole language cannot be given a structural denotational description, it is interesting to give a description which will be structural as far as

Concluding Comment Finally, the account we provide has a strong method- ological import: editing phrases like ‘no’ and ‘I mean’ select inter alia for speech events that include