• Aucun résultat trouvé

Balanced parentheses in NL texts : a useful cue in the syntax/semantics interface

N/A
N/A
Protected

Academic year: 2021

Partager "Balanced parentheses in NL texts : a useful cue in the syntax/semantics interface"

Copied!
5
0
0

Texte intégral

(1)

HAL Id: hal-01096737

https://hal.archives-ouvertes.fr/hal-01096737

Submitted on 18 Dec 2014

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Balanced parentheses in NL texts : a useful cue in the

syntax/semantics interface

Gabriel G. Bès, Veronica Dahl

To cite this version:

Gabriel G. Bès, Veronica Dahl. Balanced parentheses in NL texts : a useful cue in the syntax/semantics

interface. Workshop on Prospects and Advances in the Syntax/Semantics Interface, 2003, Nancy,

France. �hal-01096737�

(2)

a useful ue in the syntax/semanti s interfa e  Gabriel G. Bès y UniversitéBlaise-Pas al GRIL Veroni a Dahl z Simon Fraser University ComputingS ien es Department

Abstra t

Balan ed parentheses on text sen-ten es anbe obtainedfrom informa-tion on parti ular morphemes  the introdu ers  and on ine ted verbal forms. From balan ed parentheses, a partial graph of the senten e in the semanti s interfa e an be dedu ed, alongwithotherinformation. The hy-pothesis anditsexpressionwithCHR onstraintsarepresented.

1 The basi hypothesis

Many formal languages use parentheses, left ones (lp) and right ones (rp). They are bal-an ed: at the end of a well formed expression N(lp)=N(rp)(whereN:number),and,atany pointofit,N(lp)N(rp).

Parenthesesarewellidentiedobje tsin for-mal languages. Classied under the label of "auxiliary symbols" they do not have intrin-si semanti value, but they are ru ially im-portant for the spe i ation of operators do-mains. In Montague Grammar (Montague, 1974), their expressive power is even greater: indexedparenthesesen odethesynta ti oper-ationfromwhi htheyfollow.

Parentheses are widely used in formal or quasiformalsynta ti representations. But,as pointedoutbyHintikka(Hintikka,1994),they arenot "natural"obje ts. Theybelong to the synta ti ma hinery of the metalanguageused to des ribe NLexpressions, and as su h, their



Authors in alphabeti al order. Thanks are giventoCarolineHagègeforextendedand enlight-eningdis ussionsinthepreliminairiesofthiswork, andtoFrançoisTrouilleuxfor his ommments.

y

Gabriel.Besuniv-bp lermont.fr; 34 Ave. Carnot,F63037Clermont-Fd edex.

z

veroni asfu. a;8888 UniversityDr. Burnaby

use an freely hange from one ma hinery to another,eveniftheyremainbalan ed.

Butnotallformal languagesneed parenthe-ses asauxiliary symbols. The polish notation ofrst orderlogi doesnotrequirethem. Our entral hypothesis is a kind of an answer to Hintikka's hallengeobje tion: balan ed paren-theses anindeedbededu ed andnot stipu-lated  from an adequate analysis of NL ex-pressions. Furthermore, theylead to apartial graph,whi h anbeused asanimportant ue inthesyntax/semanti interfa e.

Balan ed parentheses anbe obtained from anadequateanalysisofasubsetofgrammati al morphemes su h as Fren h si, que,..., the in-trodu ers,andine ted verbalforms,ine ted hunks(e.g. alu,luiadonné,orine tedverbs (e.g. aimait,parle). Metaphori ally,theyallow tojumpto theroofofasenten efrompoor in-formationonlo al marksinitsfoundations.

2 From lo al information to the partial graph

Thebalan edparentheseshypothesis anbe il-lustrated bythe following (i), analyzedby the subsequent(ii)to (v).

i Silesparentss'étaientmisd'a ordhieret avaient bien onnu la réglementation, les bureau ratesàquiils sesontadressés au-jourd'huineleurauraientpasréponduque 'étaitimpossible,ilsauraientdûprésenter leurdossierautrement.

With respe tto(i),itis possibletosaythat thereareine tednu learverbalphrases(vn), assesont adressés,that in one ase, twovns oordinate(s'étaient mis d'a ord and avaient bien onnu),thatthisverbal oordinationisthe verbal form of the onditional senten e, that the verb form of the root senten e is ne leur

(3)

withauraient dû asverbalformis oordinated totherootsenten e.

Futhermore, it is possible to say that there aremorphologi alexpressions,simple(assi)or omplex (as à qui), whi h ag a oming vn; these aretheintrodu ers. Forinstan ese sont adressés isintrodu edbyàqui,auraient dû by the nominativeform ils. The rst vn of the oordinatedverbalformofthe onditional sen-ten e is introdu ed by si and the vn of the rootsenten eisintrodu edbyahiddenil (ini-tial limit), assumed asrstelementof any ex-pression,asfp(nal point)isthenalone.

Ifweintrodu eanlp attheleft ofil and an rp at the right of fp, asso iate an lp to ea h introdu er and an rp to ea h introdu ed vn not oordinated with another vn, and if we asso iatean lp to the rstvn of oordinated vnsandtworptotherightofthelast oordi-natedvn, balan ed parentheses on thewhole senten eareobtained.

Thus,from(i),thefollowing(ii)isobtained. In(ii),'-'joinssingleexpressionsin(i), obtain-ing hunkexpressionswhi hare omputed,with respe ttopositionandtags,asthesimpleones. Apositionisassignedin(ii)toea hexpression jointlywithatagfromaveryrestri ted vo ab-ularyV ={int,v, v1, v2,il,fp,ot}.

Besidesil and fp, presentedearlier,int inV is asso iated to introdu ers, v is asso iated to vnsnotimmediatelypre eededeitherby','or bya oordinationform,v1 isasso iatedtovns immediatelypre eededby','v2 isasso iatedto vnsimmediatelypre eededbya oordination form(aset, ou...) notpre eededbya',',andot (other)isasso iatedtoanyexpressionwhi his notasso iatedtooneoftheprevioustags. INT willspellbothint andil.

ii ((il <0;il> (Si <1;int> les <2;ot> parents <3;ot> (s'-étaient-mis-d'a ord <4;v> hier <5;ot> et-avaient-bien- onnu <6;v2> )) la <7;ot> réglementation <8;ot> , <9;ot> les <10;ot> bureau rates <11;ot> (à-qui <12;int> ils <13;ot> se-sont-adressés <14;v> ) aujourd'hui <15;ot> ne-leur-auraient-pas-répondu <16;v> ) (que <17;int> la <18;ot> hose <19;ot> était <20;v> )impossible <21;ot> , <22;ot> (ils <23;int> auraient-dû <24;v> ) présenter <25;ot> leur <26;ot> dossier <27;ot> autrement <28;ot> pf)

IfweeliminateNL expressions,leaving only

'...',spellingintervals,substitutesforots.

iii ((int 0 (int 1 ot 2 ot 3 (v 4 ot 5 v2 6 ))ot 7 ot 8 ot 9 ot 10 ot 11 (int 12 ot 13 v 14 ) ot 15 v 16 ) (int 17 ot 18 ot 19 v 20 )ot 21 ot 22 (int 23 v 24 )ot 25 ot 26 ot 27 ot 28 fp) iv ((int 0 (int 1 ...(v 4 ...v2 6 ))...(int 12 ...v 14 ) ...v 16 )(int 17 ...v 20 )...(int 23 v 24 )...fp)

In (iv), besides intervals, there are INTs and vns (i.e. v, v2) ea h in some position 0 to n. These relations an be expressed by pairs<i, j>, where i 6=j andi, j 0. These pairswillbeassignedto dierentsets.

Bygeneral onvention,qisthepositionofthe vn introdu ed bysomeINT, i.e. the non o-ordinatedvnasso iatedtothe')'whi h loses theasso iated'(', ortherst vn in a oordi-nated hain of vns, oordinated hain whi h losestheINT.IfINT=il, weexpressthe re-lationby<q,0>,ifINT6=il andinpositionp, by<p,q>.

Closingpairs(both<q, 0>and<p,q> ) are inthesetCl[osing℄. From<q,0>2Cl we an dedu e that q 2 R[oot℄, where R is either an emptyset(seeŸ3.3)orasingletonsetwiththe positionoftherootvn asmember.

Coordinated vns in verbal phrases, whi h are in hains vn

w1

...vn wn

, are denoted by oordination pairs<w1, wi>, where wi 6= w1. Coordinationpairsofverbalphrasesarein the setC-sv. Avn in positionq and losingsome int 6= il, an be the verbalform of asenten e oordinatedtotherootsenten e(e.g. auraient-dû

<24;v>

in(ii)). Inthis ase,wewrite<q,0> andthepairbelongstotheC-r set. Withthese onventions,from(iv)weobtain(v).

(v) Cl = {<16, 0>, <1, 4>, <12, 14>, <17,20>,<23,24>}

C-sv={<6,4>} R={16}

C-r={<24,0>}

Cl,C-svandC-r beingsetsofpairs,fromthe unionofthem itispossibletodedu eapartial graph,positionsin theinputbeingitsverti es.

Theparsingsystemthatobtainstheelements in(v),giventheinputexpressionin(i),is orga-nizedinModulesIandII.ModuleI,su in tly presented here, has as input a hain of As ii

(4)

with one or more segmented and enumerated senten es. An interfa e obtains (iii) from (ii). The hallenge of Module I is the disambigua-tionof expressions su h assi or la juge whi h anbe ornot ints or vns, respe tively. It is obtained by exploring lo al ontexts. Module II is expressed in twodierent ways. There is analgorithm(Algof- )whi h from(iii)obtains (v). Theother way is aplain de larativeone, makinguseofCHR onstraints.

3 CHR onstraints

CHR (Frühwirth and Abdennadher, 2003) is a very powerful multiset rewriting language. Constraints, viewed as pie es of partial infor-mation,areformalizedasdistinguished, prede-nedpredi atesinrst-orderpredi atelogi .

A onstraintprogram su essively generates onstraintsasit runs,untilasolutionisfound to the problem or no more onstraints an be generated. Rulesdes ribehowtogeneratenew onstraintsfrom those alreadygenerated. For instan e,we anviewsymbolsinagrammaras onstraintsupon word boundariesin an input string. Thus an interesting morning ould be parsedbyCHRrulessu h as:

(1) an(X,Y) ==> det(X,Y).

(2) interesting(X,Y) ==> adj(X,Y). (3) morning(X,Y) ==> noun(X,Y). (4) start ==> an(1,2),

interesting(2,3), morning(3,4). (5) det(X,Y), adj(Y,Z), noun(Z,W) ==> np(X,W).

The ontiguityandorderofthedet,adj and nounareensuredbythewordboundaries;e.g., thedet ends where theadj starts,at pointY. GC is the set of all the generated onstraints derivablefromtheinputstringdenedthrough (1) to (4). It will be generated upon the query:?- start. In GC, we an then sele t thosethat solvetheproblem weare interested in(inthis asenp(1,4),whi htellusourstring analysesintoanp).

3.1 CHR onstraintsand grammar rules

CHR rules an dire tlymirror grammarrules: (5)inŸ3mirrorsnp ! det adj noun, assum-ing ontiguity between det adj noun. CHR rules generate onstraints, bottom upand left

i j

allvn (i.e. v, v1, v2) aninstantiate vn i

or vn

j

. Forinstan e,assumingh totheleftofi:

(1) If vn i

= v2 and it oordinates with vn h =vorv1,thenvn j 6=v2. (2) If vn i

= v2 and it does not oordi-nate with vn h , or if it oordinates to a vn h =v2,thenvn j maybeav2 .

Thegrammarwhi hmirrorsCHRRulesisthus agrammaroftype1intheChomskyhierar hy

1 . TheformatoftheinputofCHRrulesis

il...x 1 ...x n ...x m ...fp wherex i

iseitheranintoravn,and'...' is, here, either an interval ore(mpty). The basi hallenge of CHR rules is to spe ify the on-straints in GC from whi h ar s in the partial graph anbeobtained.

3.2 Ar s from onstraints

Thewhole set GC is notneeded forobtaining ar s. GC is thus the domain of partial fun -tionsspe iyingpairsoftheresultinggraphin its range. As an illustration onverbal oordi-nation, onsideraregardé,regarde etregardera etableau,with(vi)asitsCHR-rulesinput.

(vi) v(1,2), v1(2,3), v2(3,4), ot(4,5), ot(5,6).

Severalgrammarrulesspe ifydierenttypes ofverbal oordination,twoofthemunderlying thespe i ationoftheC-svsetrelatedto(vi): vO 1! vv1,whi h oordinatesv1 tov ob-tainingvO 1

vO !vv1 v2,whi h oordinatesv1 andv2 tov obtainingvO

TheCHRrulesobtaintheGC (vii)from(vi).

(vii) v(1,2), v0(1,2), v1(2,3), v1R(2,3), v0 1(1,3), v1NT(2,3), v(1,2), v2(3,4), v2R(3,4), v1 (2,4), v0 (1,4), ot(4,5), ot(5,6), otR(4,6), v(2,5), v(1,5), ? yes

All the informations in (vii) are not needed forobtainingelementsinC-sv. Forinstan e, in-tervals,expressedbyotR,arenotsigni antfor

1

(5)

partialfun tions with GC asdomain, wehave F1 andF2. Given(vii), <2,1>,<3,1>2C-sv areobtainedbyF1 andF2, respe tively.

F1: v0 1(X,Z),v(X,Y), v1R(Y,Z)2GC !<(Z-1),X>2C-sv

F2: v0 (X,W),v0 1(X,Z),v2(Z,W)2GC !<Z,X>2C-sv

Anotherexampleillustratestheobtentionof the Cl set. Consider the embedded senten es (dit) que 1 la 2 lle 3 que 4 Ja ques 5 a-regardée 6 est-partie 7

with(viii)asitsinputtoCHRrules.

(viii) int(1,2),..., int(4,5),..., v(6,7), ...,v(7,8)

(1)and(2)in (ix) ompa tseveralgrammar rules. In (1), X is f, ( onstituant fermé), or e, andiNT rewrites asint ot*, ot* expressing intervalsore. In(2), Y expli it ontextual re-stri tions(seeŸ3.1), whileZ rewrites v ( om-plexeverbal),obtainingterminalstrings(avn oravn oordination)withaninitial vn.

(ix) (1) f !iNT X v (2) v/Y !Zot*

There are CHR rules whi h mirror (ix). In (x),(1)isasubsetoftheGCobtainedbythem, (2)is thepartialfun tion F3. Given(viii),(1) and(2),(3)isobtained.

(x) (1){ f(x,y), v(y,z)}

(2)F3: f(X,Y), v(Y,Z)2GC !<X,Y>2Cl

(3)<4,6>,<1,7> 2Cl

3.3 Dedu ed informationon intervals

From ar s obtained by partial fun tions from GC, besides the possibility of expressing the semanti representationofverbal oordination, otherinterestinginformations anbededu ed. Forinstan e,if<0,p>62Cl,thenR={},and itislikelythattheexpressionwillbeanominal phrase,even withoneormoreembedded rela-tives. Furthermore, dedu ed parentheses asso- iatedtoparti ularinputsymbolsspe ify inter-valswithinherentrestri tions,asin(ix).

(xi) (1)(int i ...(int j (2) (int i

...VFORM, where VFORM is vn j )or ((vn j (3)vn i ...vn j

avn insomepositionto therightof position j,whilethatisnotthe asewith(2)or(3).

4 Ongoing work and dis ussion

Ongoing work relates to the extension of ver-bal oordinations, to the improvement of the expressive power of the grammar (today with lessexpressivepowerthanAlgof- , seeŸ3) and itsCHR implementation and, last but notthe least,to theevaluationinee tivetexts ofthe underlying linguisti hypothesis, knowing be-forehand that neither all ints or all vns an beobtainedbyModuleI,norallverbal oordi-nationbehandledbyModuleII.

Evenifweknowthis, we laim that, in gen-eral,from poorand lo alinformationobtained byModuleI,thankstothededu tible hara ter ofbalan edparenthesesinNLtexts,itis possi-bleto obtainapartial graphof thewhole sen-ten e, with good and ee tive approximations in the NL-softwareengineering domain. From this, besides verbal oordination, it is possi-bletodedu e,inturn,restri tionsonintervals, whi h willredu etheparsingresear hspa e

2 .

Referen es

Luísa Coheur, Nuno Mamede, and Gabriel G. Bès. 2003. Asde opas: asynta ti -semanti interfa e. InEPIA'03 Workshop on Natural Language and Text Retrieval, Evora (Portu-gal).

T. Frühwirth and S. Abdennadher. 2003. Essentials of Constraint Programming. SpringerVerlag.

Jaako Hintikka and Gabriel Sandu. 1997. Game theoreti al semanti s. In van Jo-han Benthem andAli e ter Meulen,editors, HandbookofLogi andLanguage,pages361 410.Elsevier.

Jaako Hintikka. 1994. Fondements d'une théoriedulangage. PUF.

Ri hard Montague. 1974. Universalgrammar. InRi hardThomason,editor,Formal Philos-ophy, sele ted papers of Ri hard Montague, pages222246.YaleUniversityPress.

2

The onje tureisthattheanalysisofintervals will obtain new ar s and verti es, and from them anoriented graphonthesenten e, fromwhi h,in turn, semanti fun tions will obtain the semanti representation, f. (Coheur etal., 2003), but, fol-lowing (Hintikka,1994)and (HintikkaandSandu, 1997),wedonot laim that frombalan ed

Références

Documents relatifs

The Section of Researchers of the College of Family Physicians of Canada (CFPC) initiated this award to honour a family medicine researcher who has been a pivotal

The necessity of a Semantics-Pragmatics Interface (SPI) is due to the following empirical facts: (i) some pragmatic inferences (Conversational Implicatures, CI) are triggered

We know that it begins when the felted beech coccus or beech scale, Cryptococcus fagisuga Lindinger (C. fagi Bderensprung) infests the living bark of Fagus

Having analyzed both the theoretical foundation of the status of parentheses in English, Russian and Romanian and the example brought, it is worth mentioning

Metzler, who asserts: “The text block is a communication fragment, within which some components create the necessary conditions for the existence of others, coming forward

Applique la règle de suppression des parenthèses, puis simplifie et réduis les expressions suivantes.. Pour cela, elle développe le produit et peut facilement calculer la

Il utilise sa calculatrice et obtient le résultat suivant :. Jason a-t-il trouvé le bon résultat ? Explique ce qui s’est passé.. 2FR 1°) NOMBRES.. http://jouons-aux-mathematiques.fr

[r]