A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem

(1)

HAL Id: hal-00162114

https://hal.archives-ouvertes.fr/hal-00162114

Submitted on 12 Jul 2007

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora

resolution problem

Davy Weissenbacher, Adeline Nazarenko

To cite this version:

Davy Weissenbacher, Adeline Nazarenko. A Bayesian approach combining surface clues and linguistic

knowledge: Application to the anaphora resolution problem. Recent Advances in Natural Language

Processing, Sep 2007, Borovets, Bulgaria. 7 p. (édition électronique). �hal-00162114�

(2)

linguisti knowledge:

Appliation to the anaphora resolution problem

Davy Weissenbaher, AdelineNazarenko

Université Paris-Nord -Laboratoired'Informatiquede Paris-Nord.

99 av. J-B. Clément 93430Villetaneuse, FRANCE

dwlipn.univ-paris13.fr,nazarenkolipn.univ-paris13.fr

Abstrat

In NLP, A traditional distintion opposes the

linguistially-basedsystemsandthe knowledge-

poor ones whih mainly rely on surfae lues.

Eah approah has its drawbaks and its ad-

vantages. In this paper, we propose a new

methodwhih is basedonBayesNetworks and

allowstoombinebothtypesofinformation. As

a ase study, we fous on the spei task of

pronominalanaphoraresolutionwhihisknown

as a diultNLP problem. We show that our

bayesian system performs better than state-of-

theartanaphoraresolutionones.

Keywords

BayesianNetwork,Anaphora Resolution,linguistiknowledge,

surfaelue

1 Introdution

One often opposes knowledge based and knowledge

poor Natural Language Proessing (NLP) systems.

Therstonesexploitomplexknowledgepieeswhih

maybeautomatiallyormanuallybuiltandwhihare

thereforenotalwaysreliableoravailable. Theseond

ones relyonmahine learningmethodsandtakeonly

surfaeluesintoaount. Theygivemitigatedresults

onomplexNLPtasks.

This paper proposes an approah that overomes

thatopposition. ItreliesoftheBayesianNetworkfor-

malism, a probabilisti model designed for reasoning

ondubious, partial andlakinginformation, whih is

stilllittleexploitedin NLP.

This approah is tested on the resolution of the

anaphoripronounit,whihisaomplextaskinvolv-

ingdierenttypesofknowledgeandforwhihthereis

a lear ontrast between linguisially-based methods

and methods based on surfaelues. We designed a

systemthatreliesonaBayesianNetworkforthelas-

siationofanteedentandidatesandweompareits

performanes with that of a state-of-the-art system,

MARSproposedbyR.Mitkov[10℄,whihanbeon-

sideredasaknowledge-poorsystem.

The next setion presents the opposition between

rihandpoorapproahesintheaseofanaphoripro-

noun resolution. Setion3desribestheformalismof

the Bayesian Networks, its advantages for NLP and

we present our lassier for anaphora resolution. In

Setion 4, we ompare its performanes with several

otherones. Thelastsetiondisussestheresults.

2 The opposition between lin-

guisti knowledge and surfae

lues

Anaphora is a linguisti relation that holds between

two textual units where one of them (the anaphor)

annotgetinterpretedassuhbutreferstotheother,

whih usuallyours before (the anteedent). As the

preseneofanaphorssigniantlydegradestheperfor-

manes of NLP tasks suh as information extration

ortextsynthesis,alotofworkhasbeendevotedtothe

automatiresolutionoftheseanaphorirelationships,

i.e. theidentiationof theanteedentsof anaphori

pronouns. Inthis paper, we fous on thepronoun it

in English texts, whih is a well-known and frequent

typeofanaphors.

2.1 The usefulness of surfae lues

The traditional approah for anaphora resolution is

omposed of three steps: the distintion between

anaphoriandimpersonalourrenesofthepronoun

(it is known that... vs. it produed...), the seletion

of anteedent andidates and the hoie of the most

plausibleanteedent. Foreahofthesesteps,therst

systems relied on omplex linguisti knowledge that

reeted the deepsyntatiand semanti onstraints

of anaphori relations. As these onstraints seemed

tooomplexto build automatially,the rstsystems

relied on a set of manually designed rules, whih re-

quiredathoroughorpusanalysis.

Duringthe1990's,severalsystemsrelyingonsurfae

lueswereproposedtofaetheneedforrobustandless

expensive anaphora resolution methods [14℄. These

systems gotrid of the omplexlinguisti rulesof the

rst ones and tried to approximate them by simple

luesthat arepresumably morereliableand easierto

ompute.

For instane, [7℄ modies the RAP algorithm ini-

tially proposed by [8℄. Considering that a deep syn-

tati analysis annot be ahieved with state-of-the-

art parsers, the authors implement arelaxed version

ofthatalgorithmbasedonshallowparsing. Theyshow

(3)

manesof the newalgorithm are omparableto that

oftherstone. Anotherexampleisgivenin[5℄,whih

proposes to approximatethe semantionstraintsby

oourrenefrequenies. Theanteedentissupposed

tobelongtothesamedistributionalsubjetorobjet

lass as the anaphori pronoun and the reported ex-

periments show that these distributional onstraints

anpartiallysupplydeepersemantiones.

2.2 The limits of surfae lues

Thesurfaeluesproposedduring the1990'senabled

to build robustsystems [10℄ but reentwork hasun-

derlinedtheirlimits.

Sine the prediate-arguments shemata that im-

provetheandidateltering[11℄,areseldomavailable,

theyhavebeenapproximatedbyonurrenefrequen-

ies [5℄. However,[2℄ showsthat these frequeniesdo

notreallyenhane theperformanes ofasystemthat

isalreadybasedonmorpho-syntatiknowledge. The

ontribution of frequeniesseems to pertain moreto

hazardthantosemantis.

Suh a onlusion brings bak to the initial prob-

lem. Anaphora resolutioninvolvesomplexsyntati

and semanti knowledge that is not always available

and whih isoften notfully reliable. Previousworks

havetriedtosubstitutelinguistiknowledgebysurfae

lueswhih areeasiertoomputeandthereforemore

reliable. Howevertheseluesonlypartiallyreetthe

linguisti onstaintsand may leadto erroneousdei-

sions,whensolvingambiguousases.

2.3 Enrihing the surfae lues with

linguisti information

TheMARSsystem[10℄reliesonsurfaeluestoiden-

tify the most salient element in the disourse frag-

ment preeding a pronoun ourrene. This salient

element is onsidered as the most probable pronoun

anteedent. Thesystemreliesonapart-of-speehtag-

ging(POStagging)ofthetextandappliessomesimple

grammarrulesinordertolistthenounphrases(NPs)

ofthetwosentenespreedingagivenpronounour-

rene andtheNPs preeding thepronounourrene

in the samesentene. Foreah NP assoiated to the

pronoun ourrene, a set of onstraints and prefer-

enesisapplied. Theonstraintslterouttheimper-

sonal pronoun ourrenes and the NPs that annot

beanteedent. ThepreferenesranktheremainingNP

andidates. Eahprefereneisassoiatedwithasore,

eitherpositiveornegative,andthevarioussoresofa

andidate are summedup in aglobal sore. The an-

teedentwith thehighestsore ishosen. Whentwo

andidatesendwiththesamesore,additionalheuris-

tisareusedtorankthem 1

.

Weproposeanewsystemexploiting allthesurfae

luesofMARSbutalsointegratingthelinguistion-

straintsthatthesurfaeluesapproximate,whenever

somelinguistiknowledgeisavailable. Wearguethat

ombiningbothtypesofinformationisbeneial. For

1

The nal ranking dependson the types of the preferenes

thathavebeenusedforeahandidate andthemostreent

andidateishosen,ifnothingelseapplies.

salient element but, sine the syntati role analysis

may be erroneous, it is useful to exploit in parallel

the information relativeto the NP loation: the sur-

fae lue (the rst NP of the sentene is very often

the verb subjet) orroborates the grammatial role

hypothesis.

Our system is modeled thanks to aBayesianNet-

work. This type of representationhas beendesigned

to reason on dubious and inomplete knowledge. It

oersaprobabilistiapproahthat uniesinasingle

representation deeplinguisti onstraintsandsurfae

lues. This uniation allows to orroborate linguis-

tionstraintswiththesurfaepropertiesobservedin

orporaandtoorrettheerrorsmadebythesystems

basedonsurfaelues.

3 A unied approah: the

Bayesian model

3.1 Classiation problems

As many other NLP tasks, distinguishing anaphori

andimpersonalpronounourrenesandmoregener-

ally solvinganaphors anbe onsidered aslassia-

tionproblems[3℄.

Let us onsider for instane the hoie of the an-

teedent among various andidates. Let Corpus be

a set of texts belonging to the same domain, Train-

ing_Corpus and Test_Corpus two distint subsets

of Corpus,

P ronouns

^and

N ounP hases

^, ^the ^sets ^of

the pronoun and NP ourrenes of

Corpus

^. ^Let

R

be the set of potentialanaphora relationships. Eah

relation

r

i,j ^is represented as a ouple (

p

i^,

np

j⁾ ^of

P ronouns X N ounP hrases

^, ^where

np

j ^is ^onsid-

ered as a andidate anteedent of the pronoun

p

i

2

.

Antecedents

^and

N ot

^_

Antecedents

^are ^two^omple-

mentary sublasses of

R

^.

r

i,j ^belongs ^to ^the ^lass

Antecedent

^if ^the ^andidate

np

j îs^the ânteedent ôf

the pronoun ourrene

p

i^. ^It ^belongs ^to ^the ^lass

N ot

^_

Antecedent

^if ^the ^andidate

np

j ^is ^not ^the ^an-

teedentorifthepronoun

p

i^isimpersonal. Anyouple

r

i,j ^is^desribed^by^a^vetor

a = v

1

, ..., v

a ^of^attributes

whosevaluesaredenedinR.Eahattribute

v

k ^is^se-

letedonthebasisofananalysis ofTraining_Corpus

and orresponds toeither a linguistipiee of knowl-

edgeorasurfaelue.

The Bayes theorem states how to predit the best

lass for any new ouple of andidate NP and pro-

noun ourreneof Test_Corpus on the basis of the

regularities observed on the set of ouples of Train-

ing_Corpus: seletthelassthatmaximisestheprob-

ability

P (C|E) =

P(E|C)∗P(C) P(E)

where

C∈{Antecedent, N ot

^_

Antecedent}

^,

E

îsânêx-

ampleofTest_Corpus and

P (E|C)

^is^the^onditional

probability that E belongs to the lass

C

^given ^the

valuesoftheattributesofE. Thatprobabilityisesti-

matedonthebasisofthetrainingexamples.

2

Atually,onlytheNPsourringinthetwosentenespreed-

ingthepronounourreneorbeforeitinthesamesentene

areonsideredasandidates.

(4)

P (E|C)

^an^be^deomposed^into

P (v

1

|C)∗ ...∗ P (v

a

|C)

andtheprobabilityto maximiseis

P (C|E) =

^P(C)_P(E)

a

Π

j=1

P(v

j

|C)

Inthat ase,the lassier isa NaiveBayesClassier

(NBC) 3

.

Foranypronounourrene

p

ôf ^Têst_Corpus ând

foreahoupletowhihitbelongs,theBayesianlassi-

eromputestheprobabilityforthatoupletobelong

to thelass

Antecedent

^. Îf^the^pronounôurreneîs

anaphori,theandidatewith thehighestprobability

ishosenasanteedent.

3.2 Inferringfrom imperfetattributes

A BayesianNetwork is amodel designed for reason-

ing ondubiousand inompleteattributes. Itis om-

posedof aqualitativedesriptionoftheattribute de-

pendanies,anorientedayligraph,andofaquan-

titativedesription,asetofonditionalprobabilityta-

bles,eahrandomvariable(RV)beingassoiatedtoa

graph node. A rst parameterisingstep assoiatesa

priori onditional probabilitytables to eah RV. The

seondinferringstepmodiestheRVvaluesontheba-

sisof orpusevidene(itupdates theapriori proba-

bilitiesintoaposteriori ones). Theobservationsmade

in orpusare propagatedthroughthe network,whih

leadstoupdate thea priori valuesevenforsomeun-

observedvariables.

First_NP Subject_NP

Number_Filter

First_NP=NotFirst First_NP=First Candidate=NotAntecedent Candidate=Antecedent

Number_Filter=Singular Number_Filter=Plural

Candidate

N A

Candidate

A Candidate N

Candidate, First_NP N,F A,F N,N A,N .04

.96

.03 .78

.46 .95

.97

.36 .15 .24 .71

.08 .63 .65

Subject_NP=Subject Subject_NP=Unknown Subject_NP=Complement

.22

.05 .54

.30 .05 .01 .14

.66

Fig. 1: Example of aBayesian lassier represented

by aBayesianNetwork

Let us explain on a simplied example the infer-

ringmehanismofthe BayesianNetwork represented

on Figure 1. This network hooses the pronoun an-

teedentbyorderingthevarious ouples(

p

i^,

np

j^). ^It

is omposed of 4nodes, whih respetivelyrepresent

the probability for the andidate

np

j ^to ^be ^the ^an-

teedentof

p

i(Candidate),to havesomemorphologi- alpropertiesregardingnumber(Number_Filter),to

3

Ifthislinkiserased,thelassierbeomesanaiveBayesian

lassier. More generally, aBayesianNetwork whihstru-

ture, whihstruture isa treeof depth1 and withoutany

linkbetweenleavesisaNaiveBayesianlassier.

thesentene.

The rst prameterising step omputes the a pri-

ori probability values. These probabilities are esti-

mated on the basis of the frequenies omputed on

the set of ouple examplesextrated form atraining

orpus,forwhihalltheattributevaluesareinstanti-

ated. From these observations, we state for instane

thatP(Candidate=Anteedent)=0.04i.e. weonsider

that any andidate has a priori a probability of 4%

to bethe anteedentof ananaphori pronoun our-

rene 4

.

TheinuenelinkbetweenthevariablesCandidate

and Number_Filterindiatesthat aandidateisless

likely to be plural if it is the anteedent of the pro-

noun it (reversely,itislesslikelytobeitsanteedent

ifit isaplural noun). Inthe samemanner,thelinks

betweenthevariableCandidateandFirst_NPonthe

one hand, Candidate and Subjet_NP on the other

hand respetivelyindiatethat theandidateismore

likelytobetherstNPofthepreedingsenteneand

to bethe subjet ofthe verb ifit is thepronoun an-

teedent. The link (First_NP,Subjet_NP)onnets

twovariablesthatareonsideredasdependantoneah

otheronthebasisofthetrainingorpusandexpertes-

timation. Thismeansthatthereliabilityofthesubjet

syntatirole isinreasediftheandidatealsoours

at thebeginning ofasentene. Thisinterdependeny

is measured through the table of onditional proba-

bilities that is assoiatedto thenodeSubjet_NPon

Figure 1. Wealso addedavalueUnknowntotheRV

oftheSubjet_NPnodeasthesyntatianalysisquite

oftenfailstoassoiateagrammatialroletosomeNPs.

Thisisawaytoavoidtotakeintoaountinomplete

datafortherstevaluation ofoursystem[4℄.

One all the apriori onditional probabiliteshave

beenomputed, theinferring stepbegins. Let'stake

asanexamplethe ouple(itA transription,

it

1⁾^ex-

trated from the sentene In minimal medium, [itA

transription℄1 ^was ^about ^6-fold ^lower ^when ^gluose

was the sole arbon soure than [it℄1 ^was ^when ^su-

inate was the arbon soure. Our systemomputes

the valuesof theattributes of that ouple. Thean-

didateisnotapluralNP but itistherstNP ofthe

sentene. Sine these observations are very reliable,

weanstatethatP(Number_Filter=Singular)=1and

P(First_NP=First)=1 (strongevidene). Evenifthe

parser has produed a dependany analysis of that

sentene in whih the andidateis the subjetof the

verb, we know that this analysis may be erroneous

and weonsider that this third observation is onlya

soft-evidene: P(Subjet_NP=Subjet)=0.89

On the basisof these observations, theprobability

fortheandidatetobethepronounanteedentanbe

omputed:

P(Candidate=Anteedent

|

Number_Filter=Singular, First_NP=First,Subjet_NP=Subjet)=0.4

Our system similarly omputesthe probability for

4

Atuallyapartofhumanexpertiseisombinedwithorpus

evideneinthisprobabilityestimationbeausethe training

dataset,althoughomplete,isnotfullyreliable(somevalues

maybeerroneous). Tolowerthatnoiseeet,weintegratean

expertestimationintothea prioriprobabilityomputation,

usingtheMaximumAPosterioriapproah[13 ℄.

(5)

any other NP to be the anteedent of the pronoun

it

1^. Îf^noneôf ^theôther ândidates^hasâprobability higher than 40%, itA transription is onsidered to

beanteedentofthepronoun.

3.3 An extensive list of lassiation

attributes

We keep all the attributes of MARS, exept the C-

ommandonstraintthat ismostlyusefulfor demon-

strative pronoun anaphors (e.g. this) and the pref-

erenes speially designedfor the tehnial type of

orporaonwhihMARShasbeeninitiallytested 5

. We

also enrih that list with some additionallues lues

that are relevant for saliene alulus and whih are

usedinseveralothersystemsdesribedinthestateof

theart.

Thefollowinglistdetailsthevariouspropertiesthat

areusedasattributesbyourlassier. Eahproperty

is modelled asa node in our BayesianNetwork (see

Figure 2,whereMARSattributesandtheadditional

onesaredistinguished. Theyarerespetivelyoloured

inblakandgrey):

•

Gender_Filterand Number_Filter: the andi- datemustbemorphologiallyompatiblewiththe

pronounourrene.

•

Impersonal_Filter:theandidateannotbethe anteedentofanimpersonalpronounourrene.

•

^First_NP:^the^rst^NPôf^the^senteneîs^veryôf-

tentheverbsubjet.

•

^Subjet_NP:âândidateîs ^more^likely^to^be^the

anteedentifitistheverbsubjetthanifitholds

inadierentsyntatirole.

•

^Indiative ^verb: ^the ^NPs immediately follow- ing the verbs that belong to theindiative lass

(analyze, hek...) are supposed to be omple-

ment of these verbs and are more salient than

others. Forourexperiments, this lass hasbeen

manuallyaquiredfromatrainingorpus.

•

Repeated_NP: an NP that is repeated several times in the sameparagraph of thepronoun o-

urrene is morelikelyto besalient. These rep-

etitionsareomputedbyountingthenumberof

ourrenes of the NP head onstituent (on the

basisofasimpleharaterstringomparison).

•

Heading_Candidate: NPs ouring in a title or at the beginning of a paragraph are emphasised

andaremoresalient.

•

Colloation_Patterns: our system exploit someolloationpatterns with order onstraints

(<NP/pronoun verb> or <verb NP/pronoun>,

in whih weonsider thelemmatisedform ofthe

verbs)butalsowithsyntationstraints(<Sub-

jetverb>and<verbomplement>). Ourrene

frequeniesareomputedforeahandidatehead

ineahtypeofolloationpattern.

5

Namely,theimmediate referene andsequentialinstrution

preferenes.

•

^Term: ^the^NPs^belonging^to^the^domain^terminol-

ogyareonsideredassalientdisourseelements.

•

Definite_NP:indeniteNPsarelesssalientthan denite ones. We onsider that an NP is inde-

nite ifit doesnotfollowadenite, possessiveor

demonstrativedeterminant.

•

Prepositional_NP:ifanNPbelongtoaprepo- sitional omplement, its saliene sore is de-

reased. Theprepositionalomplementsareiden-

tiedthroughthetext onstituentanalysis.

•

^Distane: ^the ^andidates ^that ^are ^loser ^to ^the

pronounourrenearemorelikelytobethean-

teedent.

•

Proper_Name: the proper names are disourse salient elements. We onsider as proper names

alltheNPstagged assuhbythePOStaggeror

taggedasnamedentities.

•

Pronoun_NP:iftheandidateisitselfananaphori pronoun, its own anteedent is onsidered as a

salientandidateforthenewpronoun.

•

Appositive_NP: if a andidate ours in an ap- positivelause,itssalieneisdereased. Theap-

positivelausesareidentiedastextualsegments

that are preeded and followed by the same or

symmetripuntuationmarks 6

andwhihontain

noverbourrene.

•

Syntati_Parallelism:wehekthatthean- didatehasthesamesyntaxiroleasthepronoun

ourrene.

•

Semanti_Class:somesemantilassesaremore salient than others. For instane, in biologial

orpora,thegenesaremoresalientthanpersons.

•

Semanti_Consistene: if the andidate is a named entity, we hek that it is semantially

oherent with the pronoun ourrene. We list

the semantilasses ofthe NPs ourring in the

same olloation patterns asthe pronoun our-

rene and wehek that the andidate semanti

lassisoneofthose.

4 Experiments and results

4.1 Desription of the lassieurs

We have used 6 dierent lassiers for the anaphora

resolution.

Threeofthemareusedasbaselinesystems:Random

system,whihrandomlyhoosestheanteedentamong

theandidatelist,First_NPsystem,whihsystemat-

iallyseletstherstNP ofthepreedingsenteneas

thepronounanteedent,andBio_MARS,whihisour

versionofMitkov's MARSsystem. The solvingalgo-

rithm of Bio_MARS is thesame asthat MARS but

oursystemis speiallydesignedfor genomis. The

preproessinginludesthefollowingsteps: theNPlist

6

Exeptforparenthesis,whihareoftenusedforaronymsin

biologialorpora.