• Aucun résultat trouvé

Meta-level constraints for linguistic domain interaction

N/A
N/A
Protected

Academic year: 2021

Partager "Meta-level constraints for linguistic domain interaction"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: hal-00244501

https://hal.archives-ouvertes.fr/hal-00244501

Submitted on 7 Feb 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Meta-level constraints for linguistic domain interaction

Philippe Blache

To cite this version:

Philippe Blache. Meta-level constraints for linguistic domain interaction. International Workshop on Parsing Technologies (IWPT), 2003, Nancy, France. pp.245-246. �hal-00244501�

(2)

linguistic domains

PhilippeBlache

LPL-CNRS, Universite de Provence

29 Avenue Robert Schuman, 13621Aix-en-Provence, France

pb@lpl.univ-aix.fr

Submission type: Short paper

Abstract

Thispaperpresentsatechniquefortherepresentationandtheimplementationofinter-

actionrelationsbetweendierentdomainsoflinguisticanalysis. Thissolutionreliesonthe

localizationofthelinguisticobjectsinthecontext. Therelationsarethenimplementedby

meansofinteractionconstraints,eachdomaininformationbeingexpressedindependently.

1 Introduction

Descriptivelinguisticsaswellasnaturallanguageprocessingarefacedwiththequestionofin-

tegrating dierentsourcesofinformation,comingfrom dierentdomainsoflinguistic analysis

such as prosody, phonology, syntax, discourse,semantics,etc. None ofthese domains canbe

treatedindependently. Moreprecisely,theinteractionbetweendomainscontainsinitselfmany

informationthatisnotaccessibledirectly. Itisthennecessarytoexplainhowsuchinteraction

is possible. Unfortunately, even if many works exist describing the interfacebetweentwo of

thesedomains(e.g. prosody/syntaxinteraction),noneofthemprovideageneralframeworkfor

(1)representingand(2)implementingsuchinteraction. Bothquestionsareequallyimportant.

Indeed, we think that themain obstaclesof theclassicalapproachescome from thefact that

relations betweendomains areclassicallyexpressed betweenhigh-levelstructures (e.g. asyn-

tactictreeand aprosodichierarchy)andthat theseapproaches(typicallythegenerativeones

in syntax)cannoteasily dealwithpartial,spreadorevenill-formedinformation.

Wepropose in this papersomeelementsof answerfor these problems in which the repre-

sentation level relies on an anchoring system allowingto localize any kind of information at

anylevel. Theinteractionitselfcanthenbeimplementeddirectlybymeansofinteractioncon-

straints. Inafullyconstraint-basedapproachastheoneproposedhere,interactionconstraints

exploit the interpretation of the state of the constraint system for each domain in order to

propagate newinformation: theyconstitute then ameta-level. This method presents several

interests. First,itconstitutesaneÆcienttoolforcontrollingtheparseofthedierentdomains

and implements directlysome disambiguationinformation (see section 4). But it also repre-

sentsarststeptowardsageneral accountof amulti-perspectivelinguisticanalysis in which

(3)

partialinformationcomingfromthese domainsandsecondontheinteraction betweenthem.

2 Domain interaction

Thequestionoftheinteractionbetweendierentcomponentsoflinguisticanalysisisgenerally

addressedintermsofrelationsbetweenstructures. Inthisperspective,itbecomesverydiÆcult

toconsidermorethantwostructuresatthesametimeandthisprobablyexplainsthatexisting

worksusuallytakeintoconsiderationonlytwocomponents(prosody/syntax,syntax/semantics,

etc.). Such approaches presents several problems. Oneis the necessity of representinginfor-

mation and rules within a unique formalism: relations depend in this case on the way of

representinginformation. Moreover,weneedforthis averyspecic architectureconsistingin

building rstthe respective structures,analyzing them, andapplying nally someinteraction

rules expressedin terms ofcorrespondence relationsbetweenthese structures. Wethink that

one of the problems comes from the choice of the interaction level between thecomponents.

It seems preferableto use alow-levelanchoringsystem that makes itpossibleto localize the

information in theinput. It becomesthen possibleto representinformation overagivenseg-

ment of the input instead of astructure. In this perspective, relations between domains are

independentfromanyformalismandrelyonthecharacterizationofsomepropertiesfromeach

domain.

Wepresentinthefollowingsomeinteractionexamplesbetweendierentdomains. [Bear90]

proposes an implementation of the interaction between prosodic breaks and syntactic con-

stituents. Theauthors observethat when alarge prosodicbreak appearsbetweentwowords,

theydonotcombinetoformaconstituentinwhichthecorrespondingcategoriesaresisters. In

otherwords,nomajorprosodicbreakcanseparatealexicalheadandajuxtaposedcomplement

whereasratherlongbreakscanappearbetweentwocomplements. Thiskindofinformationis

of greathelp duringaparseand allowsto resolvemanyambiguous attachments. Theauthors

represent this information directly in the grammar by inserting anew category, called Link,

betweeneach categoryof aright-hand sideof aphrasestructurerule. EachLink canbecon-

strainedin itspossiblevalues. Forexample,in theruleVP ! VLink PP, thebreakbetween

V and PPcannot begreaterthan 2(inascale of0-5). Itfollowsfrom this integrativerepre-

sentationtwopossibilities. Eitherwethink possibleand necessaryto representafullprosodic

descriptioncontainingotherinformationthanbreaks(such astone, accent,duration,etc.). In

thiscase,theinsertionofprosodicinformationintoPS-rulesrequiresacompletesuperposition

ofprosodicandsyntacticstructures. Thesecond possiblechoiceconsistsin consideringbreaks

assyntacticcategories. Inouropinion,theseinterpretation areequallybad.

Anotherexampleofprosody/syntaxinteractionisgivenin[Hirst93]. Theauthorproposesa

rulepredictingthepossibleintonationalphrasesfromasyntactictree. Thisruleisformulatedas

(4)

tree, where[X] is amajor category(S, NP,VP orPP)". Inthecaseofthetree: [

S [

NP Jane]

[

VP [

V gave][

NP

thebook][

PP to[

NP

Mary]]]]therulepredictsthefollowingphrasings:

1. (JanegavethebooktoMary) 4. (Jane)(gavethebook)(toMary)

2. (Janegave)(thebooktoMary) 5. (Jane)(gave)(thebook)(toMary)

3. (Janegavethebook)(toMary) 6. (Janegave)(thebook)(toMary)

Thiskindofruleisalsohighlydependentfromthestructureandmoregenerallytheformal-

ism. Inthiscase,theinformationisnotintegratedtothegrammarasinthepreviousexample,

theruleissituatedatahigherlevelwhichgivessomekindofprioritytothesyntacticstructure

whichhastobebuiltbeforeruleapplication.

Thethirdexampleillustratesalessstudiedinteractionbetweengraphicsandtexts. [Pineda00]

proposes adescription of coreferences between objects from dierent domains. The problem

consists in associating atext and amap. Several objects are described in both sources, the

questionistond thecoreferentones. This consistsforexamplein associatingapointwitha

city, a linewith a border,etc. then to resolve thereference bymeans of informationcoming

from onedomainoranother. Forexample, let'simaginealine betweentwopointsandatext

telling that Paris is to the west from Berlin. Then, it becomes possible to associate them

respectivelytotherightandtheleftpoint. [Pineda00]proposesamultimodalversionofDRT

(see [Kamp93]) in which all possiblereferents (foreach domain) are indicatedtogether with

propertiesplusaninteractionlevelspecifyingsometranslationconstraintsbetweenthedomains.

Inthis case,each domainkeepsin acertainsenseits autonomy, theinteractionisrepresented

by the fact that there is a common set of objects plus some equations unifying them. This

techniquereliesonthefactthatbothdomainsgivesinformationoversemanticobjectswhereas

in the previousexamples, informationwas given overobjectslocated at thesame position in

thesignal. However,asinthepreviouscases,interactionisdescribedintermsofsuperposition:

it is implemented by means of translation between the languageof onedomain towards the

languageoftheother.

Theseexamplesillustrateseveralproblems. Itisclearthat thedierentlinguisticdomains

interact. Butthis canonlyexceptionallybedescribedin terms ofstructure superposition (as

formorphology/phonologyinteractionasdescribedin[Bird94]). Usually,thereisacertainkind

of correspondencebetweensubparts of domain information, asdescribedin [Hirst93]. But it

seemsdiÆcult,orevenimpossible,tosystematize such anapproach inorder toimplementall

thepossibledomaininteractions.

3 Anchoring the dierent levels

An importantpartoftheproblemconsistsinndinganinterfacepointbetweendomainsmore

than an alignment between structures. As it is the case in multimodal communication, sev-

eral parameters have to be taken into account, in particular redundancy and synchronicity.

(5)

[Kettebekov02]). In some other cases, it is asynchronousbut redundant in the sense that it

refersto thesameinterpretationdomain. In bothcases,there exists acommon pointmaking

itpossibleto indicatethattwosetsofpropertiesrefertothesameobject.

Weproposetospecifyanewkindoffeaturedescribingaposition(ormoregenerallyalocal-

ization)thatcanbeassociatedtoaninformation. Thisideatorefertotheinformationbymeans

ofitslocalizationisexperimentedincorpusannotationworks(see[Bird01]or[Blache01a]). We

proposehere to dene ageneric solutionfor indexing any kindof information. Forsomedo-

mains(typicallyprosody)atemporalindexingcomesnaturallyinmind. But,asshownbefore,

it is notadequatefor alldomains. A linearindexing overthestringis forexample necessary

for indexing written material. Finally, wealso need to index information that is not usually

associatedwithagivenposition but moregenerallywith acontext. Thisis typicallythecase

fordiscourseinformation. Weproposethentouseananchorwhichisrepresentedbyacomplex

featureasfollows:

anchor 2

4 temporal

i,j

position

k,l

contextc 3

5

Thetemporalindexis representedbytwovalues(beginningandend). Theposition isalso

a coupleof indexes(corresponding to nodes in a chartinterpretation) localizing anobjectin

the input. The context feature implements the notion of universe (i.e. a set of discourse

referents)asinDRT.Anobjectcanthenbespeciedbymeansofdierentkindofinformation:

itsdomainanditscharacterization(the setofcorrespondingproperties)containingitsanchor.

Thefollowingexampledescribesanobjectfromthesyntacticdomain,withapreciselocalization

bothonthetemporal andthelinearaxis:

obj 2

6

6

6

4

domainsynt

charac 2

6

4 catDet

anchor

"

temp

880,1000

position

2,3

# 3

7

5 3

7

7

7

5

4 Meta-level constraints

Representinginteraction betweendierentlinguisticdomains requires thepossibilityof repre-

senting direct relations between the objects of these domains. But this is not suÆcient and

in most ofthecases, such interaction relationsrequiretheknowledgeofmoreinformation, in

particular the local relations that can exists between objects (e.g. function in syntax). This

kind ofmulti-levelinformation is easily accessiblewhen using aconstraint-basedapproachin

which all information, at any level, is representedby means of constraints (alsoconceivedas

properties). Wedescribeheresuch anapproach,called Property Grammars, and showhowit

candealwithdierentlevelsofconstraint.

(6)

We presentin this sectiontheformalismof Property Grammars (see [Blache00]), in which all

informationisrepresentedmymeansofconstraints. Concerningsyntax,thefollowingsetofcon-

straintscanbeused: linearity, dependency,obligation,exclusion,requirementanduniqueness 1

.

Theycanbepresentedasfollows:

Constraint Denition Example

Linearity() Linearprecedenceconstraints. DetN

Dependency(;)

Dependency relations between cate-

gories.

AP;N

Obligation(7!)

Set of compulsory and unique cate-

gories. One of these categories (and

onlyone)hastoberealizedinaphrase.

N7!NP

Exclusion(6,)

Restriction of cooccurrence between

setsofcategories.

N[pro]6,Det

Requirement())

Mandatorycooccurrencebetweensets

ofcategories.

N[com])Det

Uniqueness(Uniq)

Set ofcategories which cannot bere-

peatedinaphrase.

Uniq(NP)=fDet,N,AP,PP,Prog

Each category is described in the grammar with a set of such constraints. A grammar

correspondsthentoaconstraintsystem. Inthisapproach,analyzinganinputcomestoevaluate

theconstraintsystem. Thestateofthesystemafterevaluationcontainsforeachcategorytheset

ofconstraintstogetherwiththeirstatus(satisedornot). Thisresult(calledcharacterization)

contains all the necessaryinformation (actually more than aclassical syntacticstructure) in

orderto specifypreciselythesyntacticpropertiesof theinput.

Inthis approach, the generalparsingmechanism(see [Blache01b]) consists, startingfrom

the set of lexical categories, in identifying all the relations connecting the categories. As a

side eect, this process can instantiate new feature values as well as new categories. The

followingschemapresentsthe core of the process. It consists in evaluating for all subsets of

categories whethertheycan beevaluated withrespect totheconstraintsystem. Ifso,theset

ofevaluatedconstraintsisaddedtothecharacterizationofthecorrespondingcategoryX.This

characterizationistoitsturnaddedtotheconstraintstoreofthedomainandthenewcategory

Xis addedto theset ofcategories.

1. S=setofcategories

2. for eachS'S

3. SAT(S');X

4. ifX6=;

5. Charac(X) SAT(S')

6. Store(X) Charac(X)

7. S S[fXg

Attheendoftheprocess,weobtainasetofcategoriestogetherwiththeircharacterization.

Itis thenpossibletoexhibitone(orseveral)solutionswhichcorrespondtoatotalcoverageof

theinput. Itisimportanttonoticethatacharacterizationcancontainnon-satisedconstraints,

whichmeansthatitispossibleto characterizeanykindofinput,beingitgrammaticalornot.

1

Itcanbethecasethatotherkindofconstraintsarenecessary(e.g. thejuxtapositionrelation). Onesimply

havetoaddtherequiredconstrainttothesystemwithoutmodifyingthegeneralarchitecture.

(7)

4.2 A meta-level for the description of interaction

The descriptionof domain interaction takes advantageof the constraint-basedapproach pre-

sentedabove. Theideais toproposeamechanismmakingit possibleto infernewproperties

according to the dierent characterizations produced for dierent domains. In other words,

thisnewkindofconstraintspecies arelationbetweencharacterizations(ratherthanbetween

categories). Insofarasdierentsourcesofinformation,comingfrom dierentdomains,arein-

volvedin these relations,thecharacterizationshaveto specifythe domainand theanchor. A

rstapproximationoftheinteractionrelationcanberepresentedasfollows:

(

obj

i

"

domaind

i

h

characc

i

anchorai i

#

,...,obj

j

"

domaind

j

h

characc

j

anchoraj i

#)

) (

obj

k

"

domaind

k

h

characc

k

anchorak i

#

,...,obj

l

"

domaind

l

h

characc

l

anchoral i

#)

(1)

Sucharelationmeansthatwhenthedierentcharacterizationsfobj

i

,... obj

j

g,eventually

comingfrom dierentdomains, are exhibited, thenthenew properties stipulatedin thechar-

acterizations fobj

k

, ... obj

l

gare added to thegeneral description. Moreover, itis possible

(even necessary) to specify akind of meeting point betweenthe domains indicating that the

dierentcharacterizationsspecifythesamephenomenon. Thisisdonebymeansoftheanchor

feature. Twokind of relations canbeused in such interaction constraints: an inferenceone,

similar to the requirement relationin property grammars, and anexclusion onestipulatinga

cooccurrency restriction betweentwo characterizations. The generalschema consists now in

building characterizations of each domain and propagating new properties according to the

interaction constraints. Thispropagationis doneat thesametime asthesatisfactionprocess:

newpropertiesarepropagatedthankstointeractionassoonasthecorrespondingcharacteriza-

tionsare instantiated. Theevaluation ofthe interactionconstraintconstitutesin itselfapart

of a generalcharacterization of the input. It establishes then somerelations (requirementor

exclusion)betweencategoriesthat canhaveadisambiguationeect.

We illustrate in the following this aspect with an example of interaction constraints im-

plementing therelation described in [Bear90] and presented in the rstsection. It stipulates

that no majorbreaks canseparate twojuxtaposed sisters connectedwith acomplementation

relation(representedby;). Theanchoringinformation allowsto situateeach object. This is

themaininterestofsucharepresentation: anobjectonlyhavetobelocated,itspropertiescan

(8)

2

6

6

6

6

6

6

6

6

6

6

6

6

4 domsynt

char 2

6

6

6

6

6

6

6

6

6

6

4 2

6

4 catc

1

anch

"

temp

t

1 ,t

2

pos

i,j

# 3

7

5

2

6

4 catc2

anch

"

temp

t3,t4

pos

k,l

# 3

7

5

depc2;c1

3

7

7

7

7

7

7

7

7

7

7

5 3

7

7

7

7

7

7

7

7

7

7

7

7

5 6,

2

6

6

6

4 dompros

char 2

6

4 catbreak

anch

"

temp

t

2 ,t

3

pos

j,k

# 3

7

5 3

7

7

7

5 (2)

Thisinteractionconstraintconnectstwocharacterizationscomingfromtheprosodicandthe

syntacticdomains. Suchinteractionconstrainttypicallyworksforattachmentdisambiguation.

In caseof ambiguity (forexample in PPattachment), theinterpretation that will befavored

thankstothisconstraintistheoneatthehigherlevelwhen amajorbreakprecedesthePP.

5 Perspectives

Interaction constraintscanrepresent many dierent kind of information. In particular, they

can be generalized to the representation of multimodal relations by means of the proposed

anchoringsystem,includingtemporalandcontextualindexes. Wepresentin thissectionsome

examplesillustratingthese aspects.

Therstconstraint,implementacoreferencerelationmymeansofunication. Inthiscase,

interaction constraint is represented with a conjunction. It involves three characterizations

comingfromthreedierentdomains.

2

6

6

4

domgesture

char 2

4 deictic

anch

temp

i,j

contC

3

5 3

7

7

5

^ 2

6

6

4 domlang

char 2

4 sem

refx

anch h

temp

i,j

i 3

5 3

7

7

5

^ 2

6

4

domgraph

char

"

sem

refx

anch

contc12C

# 3

7

5 (3)

Theconstraint(3) representsa relationbetweengesture, graphics andlanguage domains,

occurringfor exampleduring weatherTVbroadcasts. Theconstraintsindicatesthat adeictic

gesture (see [Kettebekov02]), in a certain universe (noted C) at a given time, stipulates a

coreferencebetweenanobjectspeciedinthelanguagedomain(forexampleapronoun)atthe

same time position and a discourse referent from the graphical domain (for examplea map)

that belongsto theuniverseC. Thisconstraintisformalizedasaconjunction(ratherthanan

implication)indicatingacovariation,thedierentobjectdescriptionsbeingatthesamelevel.

2

6

6

6

6

4 domling

char 2

6

6

4 sem

"

refx

content h

quant9x

relweaken(x) i

#

anchor

contextc

1

3

7

7

5 3

7

7

7

7

5

^ 2

6

6

6

6

4

domgraphics

char 2

6

6

4 sem

"

refy

content h

quant9y

relstorm(x) i

#

anchor

contextc

1

3

7

7

5 3

7

7

7

7

5 ) (4)

Références

Documents relatifs

In the second phase, the root extraction algorithm is developed further to handle weak, hamzated, eliminated-long-vowel and two-letter geminated words as there is a

• prosodic constituency. The corpus was manually annotated in terms of Intonational phrase boundaries by two of the authors. This annotation was guided by

Identify grammar component belonging to linguistic phenomena basic analysis of Spanish language (phonetics, phonology, morphology, syntax, vocabulary, semantics and

One of the possible reasons may be the significant presence, in the set of documents used for building the model, of features having uncertain polarity, Indeed, if many features

Then, documents are processed through several modules: named entity recognition, word and sentence segmentation, lemmatization, part-of-speech tagging, term tagging,

OntoCmaps is essentially based on two main stages: a knowledge extraction step which relies on syntactic patterns to extract candidate tri- ples from texts, and a

In this paper we have described how to integrate linguistic techniques into a pattern- based approach for detecting complex correspondences.. In particular, we have pre-

In this position paper we describe the actual state of the development of an integrated set of tools (called SCHUG) for lan- guage processing supporting interaction with