The role of speaker beliefs in determining accent placement

(1)

HAL Id: hal-01482602

https://hal.archives-ouvertes.fr/hal-01482602

Submitted on 22 May 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

The role of speaker beliefs in determining accent placement

James German, Eyal Sagi, Stefan Kaufmann, Brady Clark

To cite this version:

James German, Eyal Sagi, Stefan Kaufmann, Brady Clark. The role of speaker beliefs in determining accent placement. A. Benz; C. Ebert; G. Jäger; R. van Rooij. Language, games, and evolution, Springer-Verlag, pp.92-116, 2011, 978-3-642-18006-4. �10.1007/978-3-642-18006-4_5�. �hal-01482602�

(2)

The Role of Speaker Beliefs in Determining Accent Placement

James German^∗, Eyal Sagi^†, Stefan Kaufmann^†‡, Brady Clark^†

∗Laboratoire Parole et Langage

†Northwestern University

‡University of G¨ottingen April 5, 2010

1 Introduction

In English and other languages, the distribution of nuclear pitch accents¹ within a sentence usually reflects how the meaningful parts of the sentence relate to the context. Generally speaking, the nuclear pitch accent can only occur felicitously on focused parts of the sentence, corresponding to information that is not contextually retrievable orgiven.² In most contemporary theories, focus is formally represented by an abstract syntactic feature ‘F’.

Those parts of the sentence that are given tend to resist F-marking and thus nuclear accentuation.³ In short, there is a more or less tight coupling between (i) the contextual information status of parts of the sentence; (ii) thefocus structure of the sentence (represented by the distribution of syn- tacticF-marking); and (iii) the actualaccent placement in the phonological form.

1This is typically defined as the last pitch accent in an intermediate phrase (Pierre- humbert, 1980; Beckman and Pierrehumbert, 1986).

2We follow Schwarzschild’s suggestion to abandon the use ofnewas a collective techni- cal term for the complement ofgivenmaterial, since doing so would intimate a homogeneity that is not there.

3There are cases in which other constraints overrule this tendency and force prosodic pominence on given material (see Schwarzschild, 1999, for examples and discussion).

(3)

1.1 Focus projection and the grammar

A commonly encountered view holds that the relation between information status, F-marking, and accent is governed by rules that may be complicated, but are nonetheless unequivocal, even deterministic, so that relative to a particular context exactly one placement of the accent is felicitous. One can easily adduce data for which this is indeed the case, such as the question- answer pairs in (1).⁴ In each of B’s responses, the part which corresponds to the‘wh’-phrase of the question is the focus of the answer and the only natural location of the nuclear pitch accent. Consequently, the same syntactic string has to be pronounced differently in response to different questions: While each of the answers in (1a-c) is felicitous in the context of its question, it cannot be felicitously replaced with either of the others.

(1) a. A: Who did John praise?

B: John praised MARY.

b. A: Who praised Mary?

B: JOHN praised Mary.

c. A: What did John do to Mary?

B: John PRAISED Mary.

In general, though, the correspondence between accent placement and the questions an utterance can felicitously answer is not so tight. Selkirk (1996) noted that (2), in which the accent is located on the word‘bats’, can answer any of the questions in (2a-e).

(2) Mary bought a book about BATS.

a. What did Mary buy a book about?

b. What kind of book did Mary buy?

c. What did Mary buy?

d. What did Mary do?

e. What’s been happening?

Depending on which question the sentence is used to answer, different constituents of it are in focus (those which correspond to the‘wh’-phrases of the respective questions), and while all of these focused constituents contain the accented word‘bats’, they also contain additional, unaccented material in all cases except (2a).

4As usual in the literature, the location of the nuclear pitch accent is typographically indicated by capitals. This convention glosses over certain details of the intonation contour and is therefore not always appropriate. It is sufficient for our purposes, however.

(4)

Selkirk and others have argued that the relationship between accent and focus is mediated through an abstract syntactic featureF. This feature must originate from an accented word, but may percolate to other constituents, subject to certain syntactic rules of focus projection (Chomsky, 1972). In Selkirk’s system, F-marking may spread (i) from an internal argument to its head, and (ii) from a head to the constituent it projects. Thus in (3) the word ‘bats’ must be accented if it is to be F-marked at all, since for syntactic reasons theF-marking could not project to ‘bats’ from any other location in the sentence. However, the F-marking can project from ‘bats’

to each of the constituents listed asF-marked in (3b-e). The corresponding questions from (2) are given on the right.

(3) a. Mary bought a book about [BATS]F. (2a)

b. Mary bought a book [about BATS]F. (2b)

c. Mary bought [a book about BATS]F. (2c)

d. Mary [bought a book about BATS]F. (2d)

e. [Mary bought a book about BATS]F. (2e)

In most accounts of focus projection, the grammatical rules fully determine where the accent must fall in order to realize a given focus structure.⁵ Once the latter is fixed, there is no room for variation, let alone speaker choice.

1.2 The role of speaker choice

Two steps lead from context to accent: Contextual information status maps to focus structure, which in turn guides and constrains the placement of nuclear pitch accents. Both of these steps are frequently treated as though they were governed by deterministic grammatical principles. This seems to be an oversimplification at both levels, however.

First, regarding the relationship between information status and focus structure, some authors explicitly assume that focus is part of speakers’

communicative intentions, thus representing an active choice. For Roberts (1996), for example, the focus of a sentence indicates which question or issue the speaker takes to be the one currently under discussion; in this sense, focus is a means of keeping interlocutors’ common ground and communicative goals in alignment. For Schwarzschild (1999), focus structure is determined by a particular type of anaphoric relationship between parts of an utterance

5Kadmon (2001) and Winkler (1997) contain recent major overviews of the literature on focus projection.

(5)

and the contents of the discourse context, which he calls Givenness. Here, speakers have some freedom in the choice of anaphoric relationships.

Second, under certain conditions, syntactic rules constraining accent placement (such as those presented in Selkirk, 1996) are violated. German et al. (2006) showed that speakers tend to avoid placing nuclear accents on prepositions, even in contexts in which those prepositions are new and the only alternative is to place an accent on given material. Thus when uttering (4B) in the context of (4A), speakers prefer to place the nuclear accent on the direct object ‘game’ (5a) rather than the preposition ‘in’(5b), even though the preposition is the only new information in the clause.

(4) A. I noticed that Liz and Sally really like to play their game.

B. Unfortunately, Paul wrecked the tent that they play their game in.

Interestingly, though, the avoidance of accented prepositions was only partial in their study. While the overall preference was for patterns like (5a), speakers also produced the pattern in (5b), which accords with Selkirk’s focus projection rules. The design of the experiment ensured that this variability could not be attributed to speaker or experimental error.

(5) a. . . . that they play their GAME in.

b. . . . tent that they play their game IN.

German et al. account for this finding by introducing an OT-style marked- ness constraint into the grammar which militates against forms with nuclear accents on prepositions. To explain the variability in outcomes, they follow Anttila (1997) and Boersma and Hayes (2001) in proposing that this marked- ness constraint interacts probabilistically with other constraints governing the distribution of focus in the general case.

This treatment may provide as good an account of the variation as one can expect from the constraints considered by German et al. (2006), but there are reasons to doubt that it actually explains what is going on. The observation is that speakers can use a form (the one with the accent on the direct object) with a focus structure with which it is not conventionally associated. Modifying the grammatical principles to accommodate this fact would seem to imply that the form in question can in some sensemean the same as the one that is conventionally associated with the focus structure in question. But intuitively, the deviating form is “pressed into service,” so to speak, despite the fact that it doesnot mean the same.

In this paper, we propose instead to treat focus projection rules such

(6)

as those presented in Selkirk (1996) as just one factor among several that influence accent placement, and ask whether and under what circumstances it may be safe or even advantageous for speakers to violate those rules.

Specifically, we conjecture that the observations of German et al. are not due to random variation after all, but rather to factors which were not represented in their model and, consequently, not controlled for in their experiment.⁶

In a nutshell, our proposal is that speakers’ beliefs about hearers’ expectations play a role in determining when to use certain accent patterns. On the one hand, speakers’ tendency to avoid accenting prepositions is due to a cost associated with the effort involved in using such a form. On the other hand, in certain contexts the hearer can “guess” the information structure of a sentence independently of the accent pattern it carries. In cases like (4), the speaker’s choice comes down to a tradeoff: If there is a substantial risk that the hearer would choose the wrong interpretation without the information carried by the accent pattern, then the speaker will pay the extra cost and accent the preposition. If the risk of miscommunication is low, however, the speaker will tend to avoid accenting the preposition.

We formalize this tradeoff in a signaling game and explore the predictions resulting in terms of either Nash equilibrium strategies or Pareto- Nash dominant strategies (Parikh, 2001, 2010). Importantly, which strategy dominates is predicted to depend on the prior probabilities of the various information-structural interpretations under consideration. In contrast to earlier treatments of information structure and accent placement, our model does not deal with the structure of the grammar directly, but with the extent to which speakers and hearers are bound by the grammar in negotiating their respective communicative goals and preferences. Our application dif- fers from previous game-theoretic treatments (such as those in Parikh, 2010;

Benz et al., 2006) in that it is not merely concerned with pragmatic enrich- ment or strengthening, but with a case in which winning strategies may step outside the form-meaning mappings licensed by the grammar.

In Section 2, we discuss in some more detail the main assumptions and intuitions underlying our proposal. Section 3 presents the formal version of our model as well as its application to the key problem that this paper addresses. Section 4 discusses the main results and implications, and sec-

6There are precedents for the view that pragmatic factors may override the rules of grammar, for instance in binding theory. Thus Chomsky (1981) argues that “... these contexts [e.g. contexts that license Principle C violations] do not constitute counterevi- dence to principle (C); rather they indicate that principle (C) may be overridden by some condition on discourse, not a very startling fact.”

(7)

tion 5 concludes with some brief remarks about how the approach might be broadened and carried forward.

2 Contextual factors in accent placement

What are the factors that may be driving speakers’ choices in cases like (4B)?

Central to our proposal is the assumption that extant theories of focus and accent placement are essentially right (e.g., that the accent “belongs” on the preposition in (4B)), and that the role of information structure in determining accent placement is part of the knowledge speakers and hearers bring to their interactions.

In terms of its communicative function, the placement of accents is sig- nificant for a variety of reasons. For instance, it has been argued that by placing prosodic prominence on those parts of an utterance which intro- duce new information, speakers draw hearers’ attention to those parts and facilitate their understanding (Schmitz, 2005). Aside from this facilitating role, accent placement is also an aide in synchronizing speakers’ and hearers’

respective beliefs about the common ground and the goals of the ongoing interaction. Seen this way, accent placement is a grounding device (Clark and Wilkes-Gibbs, 1986; Clark, 1996; see Thompson, 2009 for an overview and references).

Under what circumstances, then, would we expect the speaker to use a form that deviates from the grammatically “correct” one for the focus structure she has in mind? One such circumstance would be if that intended focus structure is highly expected independently of specific linguistic cues in the utterance itself – if, for example, the context already makes the intended focus structure highly salient or likely. In general, the more of a need the speaker feels to provide additional cues for her intended focus assignment, the stronger the incentive to use the form that the grammar licenses. Conversely, if the context provides strong cues for the intended focus assignment, accent placement loses its significance as a grounding device.

Our model predicts that it is precisely in those situations that other factors – such as prosodic preferences – may outweigh focus structure in determining speakers’ choices. On this view, it is not surprising that participants in German et al.’s study produced (5a), rather than (5b), in the context of (4A): Speakers feel free to use such “mismatched” forms whenever they can do so without risk of miscommunication.

Thus, one factor that we take to play an important role in determining speakers’ choices is the hearer’s uncertainty about the intended focus struc-

(8)

ture. In the formal model, this parameter is encoded as the hearer’s subjective probability distribution over the various focus structures the speaker may have in mind. Following standard practice in the theory of signalling games, we assume that this probability distribution is common knowledge between both interlocutors.⁷

For the above example, this predicts that the speaker’s license to produce the “mismatched” contour in (4B), with the accent placed on the direct object, in a situation in which she takes the direct object to be given, increases with the degree to which she believes its givenness to be already expected by the hearer. A case like (4B), where the expression ‘their game’directly repeats an expression from the previous utterance, leaves little uncertainty about the speaker’s intended interpretation. Thus (4) is a very clear example of the situation we are describing: No miscommunication is likely to result from accenting the direct object.

This reasoning leads to predictions about the conditions under which speakers are allowed to produce accent patterns that do not match the intended focus structure. It does not yet explain, however, when and why speakers will actually do so. The fact that an incongruent accent placement is unlikely to result in miscommunication is not in itself a reason for preferring it over its congruent alternatives.

To address this question, we assume that in addition to the hearer’s beliefs about the speaker’s intended focus structure, three further factors play a role, all related to theeffort involved in production and interpretation.

In the model, they are represented ascosts incurred by the interlocutors.

The first factor represents the speaker’s effort in production. Here we follow German et al. as well as much of the literature on functional theories of grammar in invoking preferences against the production of particular forms (Croft, 1990; Eckman et al., 1986; Haiman, 1985, among others). Specifi- cally, in our example, we assume that the congruent form with accentuation on the preposition ‘in’ is dispreferred due to a general tendency against nuclear accentuation on certain function words (Ladd, 1980; Selkirk, 1995;

German et al., 2006). Similarly to the optimality-theoretic approach, we

7This may seem to be an oversimplification, since the speaker does not really have access to the hearer’s actual beliefs and may be mistaken about them. But recall that our goal is to model the factors which motivate the speaker’s choice of a form, at the time she makes her utterance. Although the speaker takes the listener’s perspective into consideration in making her decision, her choice can only be informed by what she takes to be the hearer’s beliefs, not by the hearer’s actual beliefs. Therefore, although one could devise a more complicated model which allows for the possibility that the speaker is wrong about the hearer’s actual probability distribution over various focus structures, this extension would not contribute substantially to the analysis we are concerned with.

(9)

assume that this tendency is always operative – thus the cost is always incurred by speakers who produce the accent on the preposition – but may be outweighed by other forces.

The remaining two factors arise from a mismatch between the grammatically determined focus-to-accent mapping and the actual choices made in production and interpretation. On the one hand, the speaker incurs a cost whenever she chooses an accent pattern that is not the one specified by the grammar for the focus structure she has in mind. Essentially, the speaker prefers, all else being equal, to adhere to the grammar, and will not deviate spuriously. Similarly, the hearer incurs a cost whenever he chooses, as an interpretation, a focus structure that is not grammatically consistent with the accent pattern that the speaker has produced. Like the speaker, then, the hearer prefers not to deviate from the grammar, but may do so when other considerations apply.

We might view this extra cost, particularly as it concerns the hearer, as a processing cost incurred by extra inferences required to decide whether such a mismatch should be permitted in the given situation. A different, though perhaps not unrelated role of this cost would be in perpetuating the grammatical system throughout the population as well as diachronically. Without some formal reflex of such “grammaticality-bias” in the model, there may be no particular advantage to any one pairing between forms and meanings.

Such a framework would furnish an account that we consider bizarre, namely that the grammatical role played by nuclear accent is itself subject to variation: Depending on the contextually given probabilities, accent sometimes marks non-givenness (as is standardly the case) and sometimes givenness.

Intuitively, one of these uses is the norm from which the other deviates. The cost we stipulate is intended to account for this intuition.

Overall, interlocutors share the goal of successful communication, and both will assign a positive value to combinations of actions that lead to the hearer’s choosing the information structure that the speaker had in mind.

Speakers and hearers also prefer choices which adhere to the grammatically specified correspondence between accent patterns and information structure, and costs are incurred whenever this correspondence is contravened. Finally, speakers may incur an additional cost for using particular forms. For our examples, it is sufficient to assume that, all else being equal, utterances that include nuclear accents on prepositions incur a greater cost than ones with nuclear accents on full nouns.

Apart from the costs and benefits, our model assumes that interlocutors share certain beliefs regarding which focus structure a speaker is likely to convey for a given sentence in a given context. Formally, this is a function

(10)

from (explicit) contexts to functions from sentences to probability distribu- tions over possible focus structures. In the cases we discuss, the context and textual content of the sentence are known. Thus, for practical purposes, this feature can be reduced to a single, mutually accessible probability distribution over those focus assignments that are sensible given the syntax and lexical content of the sentence being uttered.

All of this fundamentally assumes that the grammar provides a fixed mapping between information structure and accent placement. Yet we have not yet specified which version of such a grammar we are assuming. In fact, for the purposes of our analysis, a few minimal assumptions suffice.

We take the core aspects of Schwarzschild (1999) as the foundation of our simplified theory. Specifically, we assume that (i) all non-F-marked nodes in the syntax are interpreted as given, and (ii) each given node introduces a presupposition that there is an antecedent in the context with which it co- refers.⁸ Finally, we assume that nuclear accentuation introducesF-marking, which relieves the constituents in question of the givenness presupposition.⁹ In the German et al. study, the vast majority of productions broke down in two basic categories: one in which the only nuclear accent in the embedded clause falls on the direct object, as illustrated in (5a), and one in which it falls on the stranded preposition, as in (5b).

(5) a. . . . that they play their GAME in.

b. . . . that they play their game IN.

There were several prosodic variations in the material preceding the embedded clause (the most common being whether the head of the relative clause received an accent), but these differences did not substantially affect the predictions that Selkirk (1996) and Schwarzschild (1999) make for the embedded clause itself. Specifically, these theories predict that a pattern like (5a) can realize a number of focus assignments, including the ones in (6): In (6a) the direct object‘their game’is treated as the onlyF-marked element, whereas in (6b) theF-marking projects to the entire verb phrase.

(6) . . . they play their GAME in a. . . . they play [their game]F in

b. . . . they [[[play]F [their game]F]F [in]F]F

8Schwarzschild in fact allows for a more inclusive notion of inferability, formally mod- eled in terms of entailment under existential closure. In the minimal contexts we consider, this relation does not add anaphoric possibilities beyond those available by coreference.

9This does not mean that the elementmust not have a contextually salient antecedent, but merely that it carries no presupposition to that effect.

(11)

While (5a) is thus consistent with a number of focus structures, all of them have in common that the direct object ‘their game’ is treated as F- marked. Since that is the feature we are most interested in, we ignore the differences and collapse all of these cases into one. In contrast, for a pattern like (5b) the theories predict that the preposition ‘in’ must be interpreted as the onlyF-marked constituent in its clause; in particular, the pattern is predicted to be inconsistent withF-marking on the direct object.

Thus we draw the relevant distinction in terms of F-marking on the direct objectvs.the preposition and adopt the notation in (7). It should be reiterated, however, that this is merely a shorthand notation and that the alignment between accent placement andF-marking is more complicated.

(7) a. . . . they play [their game]F in (5a)

b. . . . they play their game [in]F (5b)

With these preliminaries in place, we turn to the specification of our formal game-theoretic model.

3 Formal model

For our purposes, a language consists of two non-empty sets F (of forms) and M (of meanings). Since we are interested in a language which comes with a conventional interpretation, the formal model should also include a mapping of some kind between F and M, such as a relation R in F ×M, which constrains the interpretations available for each form. But since our main point is that speakers can and do “step outside” the bounds imposed by the conventional interpretation, the conventional interpretation should be capable of interacting with and being overruled by other forces. To this end, we assume that the interpretation is given as part of the payoff structure – specifically, in terms ofcosts of production and interpretation.

Our example involves the speaker’s choice between the two forms in (5a) and (5b). Here we label them fNP and fnp, indicating the placement of the nuclear pitch accent on the noun phrase or on the preposition, respectively. The two meanings we are concerned with are mⁿ and m^g, corresponding to the focus-structural status of the noun phrase (see (7a) and (7b) above). Thus under the familiar grammatical constraints on accent placement, the pairingshmⁿ, fNPiandhm^g, fnpiare congruent, whereashmⁿ, fnpi andhm^g, fNPi are “mismatches.”

In our game-theoretical model, a speaker strategy is a function σ mapping meanings to forms, and a hearer strategy is a functionτ mapping forms

(12)

to meanings. On each occasion of use, the speaker and the hearer choose strategy profile.

Definition 1(Strategies and strategy profiles). Let a language L=hF, Mi be given. The set of speaker strategies for L, Ss, is the set of functions σ :M 7→F. The set of hearer strategies for L, S^h, is the set of functions τ:F 7→M. The set of strategy profiles isS=S^s× Sh.

The speaker utters the form which her strategy assigns to the meaning she wants to convey, and the hearer uses his strategy to map the form he receives to a meaning. We assume that the form the speaker utters and the one the hearer perceives are identical, thus there is no noise. Since σ and τ are functions, once they are fixed, the outcome of the exchange is determined by the speaker’s intended meaning.

Definition 2(Costs and benefits). LetL=hF, Mi. A functionCp^s:F 7→R assigns to each form inF a cost incurred by the speaker for uttering it. Two functions Cm^s, Cm^h : (F ×M) 7→ R assign to each form-meaning pair hf, mi a cost incurred by the speaker and the hearer, respectively, for producing and interpretingf as conveyingm. The benefit of successful communication is given by a function B : (M ×M) 7→ R, such that for each m, m^′ ∈ M, B(m, m^′)>0 if m=m^′, 0 otherwise.

In our example, the costs are represented by variables as follows. The production costs for the forms in question areC_np^s for placing the accent on the preposition, andC_NP^s for placing it on the noun phrase. These production costs arise due to prosodic constraints governing the respective forms and are independent of the information structure. Furthermore, C_✗^s and C_✗^h represent the speaker’s and hearer’s respective costs of producing and processing a “mismatched” interpretation of a nuclear pitch accent (i.e., mapping the accented constituent to given information). In contrast, C_✓^s andC_✓^h are the respective costs of producing and processing the “canonical”

pairings which map the accented constituent to new information. Based on the above discussion, we take it that generally C_np^s > C_NP^s , C_✗^s > C_✓^s, and C_✗^h > C_✓^h. As we will see, the relative magnitude between these pairs of costs is more important than their absolute values.

Successful communication is rewarded by a benefit which we stipulate is positive if the meaning the hearer extracts is the same as the one the speaker intended to convey (thus communication is successful), and zero otherwise.

For all possible outcomes, the benefit is the same for both interlocutors.

Thus the game is one of coordination. This choice rules out many real-life

(13)

situations, such as ones in which the speaker has an interest in misleading the hearer. The predictions of the model would change considerably in such cases, but we exclude them here because such situations lie beyond the purview of this paper.

For each linguistic encounter, the benefits and costs associated with the chosen strategy profile jointly determine its utility:

Definition 3 (Utility). Given Cp^s, Cm^s, Cm^h and B, a utility function U : M×F×M 7→R forL is defined as follows: For allm, m^′ ∈M andf ∈F,

U(m, f, m^′) =B(m, m^′)−Cp^s(f)−Cm^s(f, m)−Cm^h(f, m^′)

Now, neither of the interlocutors knows the other’s choice of strategy, and the hearer only has probabilistic information about the speaker’s intended meaning. Therefore the outcome is not predictable with certainty.

However, since the hearer’s beliefs about the speaker’s intentions are common knowledge, both participants are able to calculate theexpected utility of each strategy pair hσ, τi – the weighted sum of the utilities for each of the meanings the speaker may intend, where the weights are the hearer’s subjective probabilities of those meanings.

Definition 4(Expected utility). Let L=hF, Mi be a language, U a utility function for L and P :M 7→ [0,1] a probability distribution over the mean- ings in L such that for each m ∈M, P(m) is the hearer’s prior probability that the speaker intends to convey m. The expected utility for L given U and P is a function EU :S 7→R defined as follows, for all hσ, τi ∈ S:

EU(σ, τ) = X

m∈M

P(m)×U(m, σ(m), τ(σ(m)))

With the definitions so far, we have secured all the ingredients for agame in the formal sense.

Definition 5 (Game). Given a language L, a utility function U as defined above, and a probability distribution P over meanings in L, a (two-player) game forL is a triple a=h{s, h},S, EUi, wheres, hare speaker and hearer, respectively; S is the set of strategy profiles for L; and EU is the expected utility function for L given U andP.

The most fundamental and commonly used notion in making this pre- diction is that of a Nash Equilibrium. In a game of coordination like ours, there is always at least one Nash Equilibrium; in general, there may be more than one.

(14)

Definition 6 (Nash Equilibria). The set of Nash equilibria in a game a is the set

N E(a) = {hσ, τi|∀σ^′[EU(σ^′, τ)≤EU(σ, τ)]

∧∀τ^′[EU(σ, τ^′)≤EU(σ, τ)]}

The Nash equilibrium has been employed in linguistic analyses by Lewis (1969), Dekker and van Rooij (2000), and others. However, it has some limitations which have prompted some authors to look for refinements and alternatives. Parikh’s (2001, Section 4.4; 2010, Section 3.3.5) proposal is to filter the Nash equilibria in a given game by the criterion of Pareto domi- nance in order to eliminate “local minima” and retain only those that are closer to our intuitive notion of “best choice.” Overall, the question of appropriate solution concepts for various kinds of games is still open (cf. van Rooij, 2004:506; Parikh, 2006). Here we adopt Parikh’s strategy of Pareto- dominance as the criterion for the normative model. More specifically, we adopt the notion of weak Pareto-dominance, which ensures that if there is at least one Nash Equilibrium in the game, then there is a (not necessarily unique) Pareto-dominant one. Therefore, since our games are guaranteed to have Nash Equilibria, at least one of them has to be Pareto-dominant.

Definition 7 (Pareto-Nash Equilibria). The set of Pareto-Nash Equilibria in a game a is the set

P N E(a) ={hσ, τi|∀σ^′, τ^′[EU(σ, τ)≤EU(σ^′, τ^′)

→EU(σ, τ) =EU(σ^′, τ^′)]}

With these formal notions in place, let us now examine our example more closely. Recall that the set of meanings is{mⁿ, m^g}(where the noun phrase is new and given, respectively) and the set of forms is{fNP, fnp}(where the accent is placed on the noun phrase or on the preposition). Table 1 lists all possible speaker and hearer strategies together with the associated costs for speaker and hearer for each possible move.

4 Results

In this section we present an analysis of the model in terms of the dominance relationships between the expected utilities of strategy sets under various conditions.¹⁰ We have limited our detailed analysis to just the first

10The relationships between the various strategy sets take the form of conditions on dominance based on the variables that the model includes. In certain cases, these con-

(15)

Table 1: Speaker and hearer strategies with costs incurred by each move

Speaker Hearer

strategies costs strategies costs

σ1 :

mn7→fNP

m^g7→fnp

C_NP^s C_✓^s

C_np^s C_✓^s τ1:

fNP7→mn

fnp7→m^g

C_✓^h C_✓^h σ2 :

mn7→fNP

mg7→fNP

C_NP^s C_✓^s

C_NP^s C_✗^s τ2:

fNP7→mg

fnp7→mg

C_✗^h C_✓^h σ3 :

mⁿ7→fnp

mg7→fNP

C_np^s C_✗^s

C_NP^s C_✗^s τ3:

fNP7→m^g fnp7→mn

C_✗^h C_✗^h σ4 :

mⁿ7→fnp

mg7→fnp

C_np^s C_✗^s

C_np^s C_✓^s τ4:

fNP7→mⁿ fnp7→mn

C_✓^h C_✗^h

two rows and columns – that is, to the strategy profiles involvingσ1,σ2,τ1

and τ2. There are several reasons for this. First of all, we feel that the relationships between these strategy sets most clearly illustrate the intuitions behind the phenomenon we are modeling. hσ1, τ1i, for example, represents the “canonical” situation in which the speaker and hearer fully observe the rules of the grammar, thereby maximizing the benefit from successful communication and minimizing the costs from grammatical mismatches, while hσ2, τ2irepresents what is in many ways the most interesting deviation from this pattern: The speaker avoids the extra cost associated with accenting the preposition even when the NP is given, and the hearer interprets all forms as having a given NP.

Strategy sets involving σ3, σ4, τ3 and τ4 deviate in other, sometimes interesting ways. It should be noted, in fact, that under certain conditions, the set of Nash equilibria and even Pareto-dominant strategies is not limited to the first four strategy sets. In the discussion that follows our analysis

ditions are mathematically non-trivial, and may seem somewhat abstract in comparison to the concrete communicative processes that we are trying to model. It should be noted that we do not mean to imply that the variables in our model, to the extent that they have a psychological reality, take on a precise numerical value that one could measure with any precision. Nevertheless, the mathematical inequalities do serve to elucidate certain broad tendencies that are likely to hold if the factors we consider have any psychological reality at all, and these are discussed where appropriate.

(16)

(Section 4.2), we mention such cases and discuss their implications for our model. For reasons of space, however, we leave these out of the analysis itself, and we leave it to the reader to carry out the associated mathematical proofs.

4.1 Costs and cost differentials

First of all, in comparing strategy profiles we can dispense with using costs directly (e.g., C_np^s and C_NP^s ) and operate with just the differences between them. The resulting rankings are the same because for any given strategy profile, the expected utility is just the weighted sum of the utility terms, where for each pair of related costs (e.g., C_np^s and C_NP^s ), if one is incurred with probabilityx, then the other is incurred with probability (1−x). Thus for instance, the total form cost incurred for any strategy set is described by the term in (8a), which is equivalent to (8b).

(8) a. M×C_np^s + (1−M)×C_NP^s b. C_NP^s +M(C_np^s −C_NP^s )

Since the lower cost term (here, C_NP^s ) is constant across strategy sets, it can be ignored, for it will always be subtracted out of any comparison or inequality between two strategy sets. In other words, pairs of terms like (8a) will henceforth be replaced by terms like (9), where Dp^s is the difference between the two costs.

(9) M×Dp^s

With this in mind, we can write the payoff matrix as in Table 2. (For readability, the matrix is spread over two rows.)

Fact 1. hσ1, τ1i dominates hσ1, τ2i whenever Pg<1.¹¹

11Proof. Notice first that (i) and (iv) are equivalent: (ii) is obtained by substitution from the payoff matrix, the rest follows by simple algebra (recall thatB−PgB=PnB).

EU(hσ1, τ1i)> EU(hσ1, τ2i) (i)

B−PgD^sp> PgB−PgD^sp−PnD^hm

(ii)

PnB+PnD^hm>0 (iii)

Pn(B+Dm^h)>0 (iv)

Clearly (iv) is true if and only if both factors on the left-hand side are true; i.e., iffPn>0 (equivalently,Pg<1) andB+Dm^h>0. The latter is assumed.

We omit the proofs of subsequent results; they are obtained in a similar fashion.

(17)

Table 2: Payoff matrix

τ1 τ2

σ1 B−P^gD^sp P^gB−P^gD^sp−PⁿDm^h

σ2 PnB−PgDm^s PgB−PgD^sm−D^hm

σ3 −PⁿD^sp−D^sm P^gB−PⁿD^sp−D^sm−P^gD^hm

σ4 P^gB−D^sp−PⁿD^sm P^gB−Dp^s−PⁿD^sm

τ3 τ4

σ1 −PgDp^s−D^hm PnB−PgDp^s−PgDm^h

σ2 PgB−PgD^sm−D^hm PnB−PgDm^s

σ3 B−PⁿD^sp−Dm^s −D^hm PⁿB−PⁿD^sp−D^sm−PⁿDm^h

σ4 PⁿB−Dp^s−PⁿD^sm−D^hm PⁿB−Dp^s−PⁿD^sm−Dm^h

In descriptive terms, this just means that whenever the speaker is using a strategy that is sensitive to her intended meaning and accords with the rules of focus projection, then it is always better if the hearer uses a strategy that is sensitive to form and also accords with those rules.

Fact 2. hσ1, τ1i dominates hσ2, τ1i whenever Pg>0 andD^sp< B+D^sm. Since we are assuming thatDm^s is positive, under the further assumption thatD^sp< B this condition will always be met.

In fact, there is good reason to assume thatDp^s< B as a general fact. If the cost of accenting a preposition were greater than the benefit that could be gained from successful communication, then it would always be better to remain silent than to produce such a form. This is not what is observed, however. Speakers in the German et al. study, especially, did accent prepositions, so the cost of doing so cannot be higher than the maximum benefit that can be attained in this way.

Intuitively, this result suggests that as long as that the hearer is using a strategy that is sensitive to form and conforms to the rules of focus projection, it is better for the speaker to mark her intentions in a way that also conforms to the grammar. Avoiding the cost of accenting the preposition will never sufficiently offset the risk of unsuccessful communication in such a case.

Fact 3. hσ2, τ2i dominates hσ2, τ1i whenever Pg>(Dm^h/2B) + 0.5.

(18)

As long as D^hm< B, the right side of the inequality ranges between 0.5 and 1.¹²This means thatPg must be at least as large as 0.5 in order for the condition to hold. In addition, for a fixedB, the minimum condition onP^g increases linearly as a function ofD^hm.

Since the speaker always accents the NP in σ2, a hearer using τ2 will always incur a mismatch cost, regardless of the speaker’s actual intention.

This result then suggests that the probability of the NP being given (P^g) has to be high enough so that the increased chance of successful communication is sufficient to offset the hearer’s cost of always deviating from the grammar.

If that probability is too low, orDm^h is too high, then it would be better for the hearer to interpret accented NPs as new, and accept miscommunication in all cases where the speaker intends the NP to be given.

Fact 4. hσ2, τ2i dominates hσ1, τ2i whenever Dp^s> Dm^s +Dm^h and Pg>0.

If the hearer is using an insensitive interpretive strategy, then it is prefer- able that the speaker use a uniform marking strategy whenever the cost of accenting the preposition is higher than either the speaker or hearer mismatch costs.

This relationship is less intuitive than the others, since it raises the question of why the speaker would bother to mark the focus structure with accent placement if the hearer is not attending to form. However, it makes more sense from the standpoint of cases where the NP is given. In those cases, usingσ2 always avoids the cost of accenting the preposition. However, sinceσ2 and τ2 each incurs a mismatch cost when the NP is given, the cost avoided (Dp^s) has to offset the costs incurred (D^sm+Dm^h).

Fact 5. hσ1, τ1i is a Nash equilibrium.

This follows straightforwardly from Facts 1 and 2 and the definition of a Nash equilibrium. WhenP^g= 0 orP^g= 1, then this will be weakly true, since EU(hσ1, τ1i) will be equal to EU(hσ2, τ1i) and EU(hσ1, τ2i) respectively in those cases.

12See the discussion above regardingDp^sandB in the preceding paragraphs. A similar argument applies here. IfD^hmwere greater thanB, then hearers would never deviate from the grammar of focus projection by interpreting an accented NP as given. They would adhere rigidly to the grammar even at a very high risk of miscommunication, in spite of any and all contextual evidence in favor of an ungrammatical interpretation. While the production data does not corroborate this assumption in the same way as forDp^s, on an intuitive level, this is precisely what we are assuming licenses a speaker to contravene the rules of focus projection in the German et al. data.

(19)

Fact 6. ForP^g>0,hσ2, τ2iis a Nash equilibrium wheneverP^g>(Dm^h/2B)+

0.5 and D^sp> Dm^s +D^hm.

This is just the conjunction of the conditions in Results 3 and 4. Note that when this stronger condition is met, thenhσ1, τ1iand hσ2, τ2i are both Nash equilibria. When it is not met, then hσ1, τ1i is the only Nash equilibrium, since, except when Pg = 1 or Pg = 0, hσ2, τ2i’s competitors, hσ1, τ2i andhσ2, τ1i, are always dominated by hσ1, τ1i.

Fact 7. hσ2, τ2i strictly dominates hσ1, τ1i whenever P^g>(B+Dm^h)/(B+ Dp^s−Dm^s).

Notice that since Pg≤1, this can hold only if Dm^h ≤D^sp−Dm^s. This is equivalent toD^sp≥Dm^s +Dm^h, which we know from Result 4 is a prerequisite forhσ2, τ2i’s being equilibrium strategy in the first place. In addition, since we are assuming that B > D^hm and B > D^sp, this condition implies that Pg must be at least greater than 0.5. Beyond that, the condition on Pg

varies with (i) the difference betweenDm^h and the term (Dp^s−Dm^s), and (ii) the magnitude of these two terms as a proportion of B. When Dm^h and (Dp^s−D^sm) are relatively small as a proportion of B, then their difference will have little effect on the minimum condition forP^g, and that condition will be close to 1.0. By contrast, when those terms are large as a proportion ofB, then their difference will have a large effect on the condition. When the difference is very small, thenP^g must be close to 1.0, but as the difference increases, hσ2, τ2i may dominatehσ1, τ1i at smaller values of P^g.

This relationship is intuitively plausible, first of all, from the standpoint of the relative size of the factors as a proportion of the benefit for successful communication. When the difference between the cost of accenting a preposition (D^sp) and the speaker mismatch cost (D^sm) is very small, the switch from a strategy that avoids accenting prepositions (σ2) to one that avoids mismatch costs in exactly the same cases (σ1) is virtually an even trade, and there is little motivation to do so except when the probability is very high that the NP is given. This relationship also makes sense from the standpoint of speaker costs versus hearer costs. If the differential just described (i.e., Dp^s−D^sm) is not much bigger than the cost a hearer incurs for a mismatch, then there is little motivation to use a strategy that incurs such a cost, except when it is very likely that the NP is given. By contrast, when the cost of accenting a preposition is very high as a proportion of B, the cost of both speaker and hearer mismatches are very low as a proportion of B, then it is desirable to use a pooling strategy that avoids production costs whenever it is even moderately more likely that the NP is given than not.

(20)

4.2 Discussion

Besides the equilibria described in Results 5, 6 and 7, there are several other interesting cases that lie outside of the four strategy profiles discussed above.

To begin with,hσ2, τ3i is equivalent tohσ2, τ2i, and therefore dominates its row in exactly the same set of cases as hσ2, τ2i. This is intuitively clear when one considers that σ2 only generates fNP, which τ2 andτ3 both treat in the same way. The conditions under which hσ2, τ3i dominates in its column, however, are rather specific and unintuitive, and it is not of much use, therefore, to discuss the conditions under which hσ2, τ3i forms a Nash equilibrium. To the extent that it does, however, it cannot possibly be Pareto-dominant in any cases thathσ2, τ2i cannot also be. Moreover, when hσ2, τ2i and hσ2, τ3i are both (weakly) Pareto-dominant, it is not possible to distinguish between them behaviorally. In fact, one might speculate that speakers and hearers do not care which hearer strategy is being employed in such cases, since half of the strategy cannot possibly be relevant for the outcome.

It is also noteworthy that hσ2, τ4i is not only equivalent to hσ2, τ1i, but also dominates its column under relatively weak sets of conditions. When the conditions described in Result 3 are not met, such that hσ2, τ1i domi- nateshσ2, τ2i(andhσ2, τ3i),hσ2, τ4imay be a weak Nash equilibrium. Note, however, that according to Result 1,hσ1, τ2iis always dominated byhσ1, τ1i and can never itself be a Nash equilibrium. This implies even whenhσ2, τ4i is a weak Nash equilibrium, it will be Pareto-dominated byhσ1, τ1i.

Finally,hσ3, τ3irepresents a surprisingly strong strategy set in our model under a range of conditions. It is a general result (whose proof we leave to the reader) that it dominates its own row whenever 0 < Pg < 1. It also dominates hσ1, τ3i, and hσ4, τ3i whenever B > D^sm, which we are assuming anyway. Finally, it dominateshσ2, τ3i, and therefore represents a Nash equilibrium, wheneverB < Dm^s +D^sp. Moreover, sincehσ2, τ3i is equivalent tohσ2, τ2i,hσ3, τ3i also Pareto-dominates hσ2, τ2iin those cases. Note however, thatEU(hσ3, τ3i)> EU(hσ1, τ1i) wheneverPg>(Dm^s+D^hm)/2Dp^s+0.5.

In other words, like hσ2, τ2i, Pg must be somewhat higher than 0.5 before hσ3, τ3i even competes with hσ1, τ1i for Pareto-dominance.

This result suggests that using a strategy set like hσ2, τ2i, that is insensitive to both intentions and forms, may not be the only rational alternative for avoiding costly forms. Under certain conditions, it may be better to use a strategy set that actually reverses the mapping associated with the rules of focus projection. This implies that both speaker and hearer mismatch costs are incurred for every possible outcome. This extra cost is offset, how-

(21)

ever, by the fact that communication is always successful. Furthermore, althoughhσ3, τ3i does not completely avoid extra form-based costs the way thathσ2, τ2i does, those costs play less of a role when the NP is very likely to be given, since the more costly form is the less likely to be used in such a case.

It is not possible to know from the German et al. results whether speakers who accented given NPs were using hσ2, τ2i/hσ2, τ3i or hσ3, τ3i. As already mentioned,hσ2, τ2iandhσ2, τ3iare indistinguishable from a behavioral standpoint, so there is no data that could ever distinguish between them.

hσ3, τ3i, on the other hand, predicts a distinct set of behavioral outcomes, which could, in principle, distinguish it fromhσ2, τ2i and hσ2, τ3i. We leave this to future research.

In many ways, however, there is something very counterintuitive about hσ3, τ3i, even if our model suggests that is sometimes the most rational outcome. Does it make sense that speakers and hearers would, or ever do, temporarily negotiate a set of strategies that literally flips the grammar on its head? Temporarily resorting to a pooling strategy, on the other hand, is easier to imagine, and more closely resembles various other human behav- iors (such as laziness) that have a stronger presence in popular discourse and folk psychology. Perhaps it is not unreasonable, then, to suppose that there are other biases involved that our model does not represent. In other words, while speakers and hearers may temporarily accept a slight deviation from the rules of the grammar when there are obviously costly forms to be avoided with relatively minimal risk of miscommunication, they may assign a disproportionately high cost to strategy sets that deviate too far from those rules.

5 Conclusions and future work

We conclude this paper with a brief discussion of areas in which our model goes beyond existing theories of both accent placement and game-theoretic pragmatics, some other phenomena where a similar approach would seem promising to us, and a suggestion of ways to test the predictions of the model.

5.1 Novel aspects of the model

Accent placement. As explained above, our model concerns the ratio- nalisticfactors that influence a speaker’s choice of accent placement. It does not seek to address the rules of the grammar that relate accent placement

(22)

to information structure. However, some aspects of our proposal indirectly concern the architecture of that grammar. First, while we agree with the suggestion of German et al. (2006) and others that certain forms may be dispreferred despite being well-formed, we argue that such dispreferences should not be encoded in the grammar, but should be explained in terms of factors that lie outside of the grammar. Second, we argue that a mutually accessible probability distribution over possible speaker intentions (i.e., focus assignments) plays a key role in interlocutors’ selection and interpretation of accent placement. Much previous work has neglected to reconcile the explicit assumption that speakers freely choose an intended focus structure with the implicit assumption that intuitions about felicity are a reflection of grammatical constraints. In our model, we straightforwardly adopt the first assumption and propose that intuitions about felicity may be explained by the fact that interpretation is guided by a mix of forces, including mu- tual beliefs and expectations. This offers a way to reconcile two widespread but seemingly contradictory assumptions in the theory of focus and accent placement.

Game-theoretic pragmatics. Game theory has been particularly useful for modeling the ways in which interlocutors enrich the conventional meaning of forms. In Parikh’s (2001) analysis of scalar implicatures, for example, strategic inferences make it possible for an utterance of ‘Some of the boys went to the party’ to convey the truth-conditionally stronger meaning of the sentence ‘Some of the boys went to the party, and not all of the boys went to the party’. Notice that the latter meaning entails the former. In fact, it is typical of game-theoretic analyses that the meanings at issue are monotonically related in some way.¹³ By comparison, our analysis does not assume any particular relationship between the grammatically determined interpretation of an accent pattern and the interpretation that results from strategic inference. Our model does not require, in other words, that the pattern ofF-marking in (10a) and those in (10b) and (10c)¹⁴ be related in any particular way.

13Consider Parikh’s (2001) analysis of relevance implicatures, for example, in which a sentence like ‘It’s 4pm’ is enriched with the meaning of‘Let’s go for the talk’. Even though the latter is not “semantically related” to the former in the same way that‘Some of the boys went to the party’is related to‘Not all of the boys went to the party’(Parikh, 2001, p. 93), the inferred content is monotonically added (via logical conjunction) to the conventional meaning of the utterance.

14Recall that, because of focus projection, accentuation on the direct object is grammatically consistent with multiple patterns ofF-marking.

(23)

(10) a. . . . they play their game [IN]F

b. . . . they play [their GAME]F in

c. . . . they [[[play]F [their GAME]F]F [in]F]F

Instead, the alternatives to the grammatically licensed interpretation arise merely because they correspond to different ways of assigning F- marking to the underlying syntactic representation. Certain alternatives then emerge as more relevant to the game structure because the context renders them more probable than others.

5.2 Extending the analysis

There are additional cases in which weak prosodic restrictions may be interacting with discourse-related constraints. The ones we address in this section differ from our own in a number of ways, yet we feel that there is an underlying similarity among them that warrants a common treatment.

Tone Compression in German. Languages vary in the way they treat complex intonation contours applied to monosyllables (Ladd, 1996, 132-4).

In English, rise-fall-rise contours may be associated with a single syllable, as in (11):

(11) Sue?!

L+H* L-H%

English works differently in this respect from German. Examples (12) and (13) illustrate the high-fall-rise intonation contour that marks questions in German (Ladd, 1996, 133).

(12) Ist das Ihre T ¨UTE?

H* L-H%

‘Is this your BAG?’

(13) #Ist das Ihr GELD?

H* L-H%

‘Is that your MONEY?’

In (12), the three tones associated with this contour are realized over two syllables,¹⁵ such that no syllable carries more than two tones. In (13), by comparison, all three tones are compressed onto a single syllable.

15Note that‘T¨ute’is pronounced/ty:"t/.

(24)

According to Ladd (1996), (13) has a “phonetically degraded” quality, even in contexts in which it is expected to be pragmatically appropriate (e.g., someone has left some money on the table). Ladd suggests that in such cases, a speaker is likely to substitute an alternative form, such as (14), in which the three tones are realized over at least two syllables.

(14) Ist das IHR H*

Geld?

L-H%

‘Is that YOUR money?’

Crucially, however, (13) and (14) do not seem to carry the same meaning.

In terms of our earlier framework, (14) marks the expression‘Geld’as given, and is predicted to be most appropriate when money has been explicitly mentioned in the discourse, while (13) is more appropriate otherwise. In short, the form-based preference for (14) over (13) appears to outweigh the speaker’s desire to mark the status of ‘Geld’ in the contextually most appropriate way.

This suggests an interesting twist on our analysis of stranded prepositions. In that analysis, we claimed that certain focus structures may be conveyedin spite of being inconsistent with what is required by focus projection rules. By comparison, it does not seem likely that (14) can be used to convey the meaning of (13). On the contrary, (14) intuitively seems to require that the listener accommodate the fact that ‘Geld’ is given. This suggests that the relevant tradeoff is not between form-based costs and mismatch costs as in our earlier example, but between form-based costs and the utility that the speaker associates with each of the possible meanings.

If the speaker assigns roughly equal utility to each way of assigning a status to ‘Geld’, for example, with perhaps a slight preference for treating ‘Geld’

as non-given as in (13), then there is an increased potential for factors other than context to influence the speaker’s choice. In this particular case, the preference for avoiding tone compression is sufficient to favor the pattern in (14). If, however, the speaker were to associate a much higher utility with treating ‘Geld’ as non-given as compared with treating it as given, then the speaker will prefer (13), and any preference between the two forms is unlikely to affect his or her decision.¹⁶

16It is interesting to note that thepreferred form in this case involves a nuclear accent on

‘Ihr’‘your’, which is a possessive pronoun and therefore a function word. To the extent that German et al.’s findings for prepositions generalize to other function word categories, this would be somewhat unexpected. In the end, however, the preference for (14) is observed independently on the basis of phonological well-formedness (however impressionistic), so this does not pose a problem for our approach. It does suggest, however, that in this