HAL Id: hal-01096737
https://hal.archives-ouvertes.fr/hal-01096737
Submitted on 18 Dec 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Balanced parentheses in NL texts : a useful cue in the
syntax/semantics interface
Gabriel G. Bès, Veronica Dahl
To cite this version:
Gabriel G. Bès, Veronica Dahl. Balanced parentheses in NL texts : a useful cue in the syntax/semantics
interface. Workshop on Prospects and Advances in the Syntax/Semantics Interface, 2003, Nancy,
France. �hal-01096737�
a useful ue in the syntax/semanti s interfa e Gabriel G. Bès y UniversitéBlaise-Pas al GRIL Veroni a Dahl z Simon Fraser University ComputingS ien es Department
Abstra t
Balan ed parentheses on text sen-ten es anbe obtainedfrom informa-tion on parti ular morphemes the introdu ers and on ine ted verbal forms. From balan ed parentheses, a partial graph of the senten e in the semanti s interfa e an be dedu ed, alongwithotherinformation. The hy-pothesis anditsexpressionwithCHR onstraintsarepresented.
1 The basi hypothesis
Many formal languages use parentheses, left ones (lp) and right ones (rp). They are bal-an ed: at the end of a well formed expression N(lp)=N(rp)(whereN:number),and,atany pointofit,N(lp)N(rp).
Parenthesesarewellidentiedobje tsin for-mal languages. Classied under the label of "auxiliary symbols" they do not have intrin-si semanti value, but they are ru ially im-portant for the spe i ation of operators do-mains. In Montague Grammar (Montague, 1974), their expressive power is even greater: indexedparenthesesen odethesynta ti oper-ationfromwhi htheyfollow.
Parentheses are widely used in formal or quasiformalsynta ti representations. But,as pointedoutbyHintikka(Hintikka,1994),they arenot "natural"obje ts. Theybelong to the synta ti ma hinery of the metalanguageused to des ribe NLexpressions, and as su h, their
Authors in alphabeti al order. Thanks are giventoCarolineHagègeforextendedand enlight-eningdis ussionsinthepreliminairiesofthiswork, andtoFrançoisTrouilleuxfor his ommments.
y
Gabriel.Besuniv-bp lermont.fr; 34 Ave. Carnot,F63037Clermont-Fd edex.
z
veroni asfu. a;8888 UniversityDr. Burnaby
use an freely hange from one ma hinery to another,eveniftheyremainbalan ed.
Butnotallformal languagesneed parenthe-ses asauxiliary symbols. The polish notation ofrst orderlogi doesnotrequirethem. Our entral hypothesis is a kind of an answer to Hintikka's hallengeobje tion: balan ed paren-theses anindeedbededu ed andnot stipu-lated from an adequate analysis of NL ex-pressions. Furthermore, theylead to apartial graph,whi h anbeused asanimportant ue inthesyntax/semanti interfa e.
Balan ed parentheses anbe obtained from anadequateanalysisofasubsetofgrammati al morphemes su h as Fren h si, que,..., the in-trodu ers,andine ted verbalforms,ine ted hunks(e.g. alu,luiadonné,orine tedverbs (e.g. aimait,parle). Metaphori ally,theyallow tojumpto theroofofasenten efrompoor in-formationonlo al marksinitsfoundations.
2 From lo al information to the partial graph
Thebalan edparentheseshypothesis anbe il-lustrated bythe following (i), analyzedby the subsequent(ii)to (v).
i Silesparentss'étaientmisd'a ordhieret avaient bien onnu la réglementation, les bureau ratesàquiils sesontadressés au-jourd'huineleurauraientpasréponduque 'étaitimpossible,ilsauraientdûprésenter leurdossierautrement.
With respe tto(i),itis possibletosaythat thereareine tednu learverbalphrases(vn), assesont adressés,that in one ase, twovns oordinate(s'étaient mis d'a ord and avaient bien onnu),thatthisverbal oordinationisthe verbal form of the onditional senten e, that the verb form of the root senten e is ne leur
withauraient dû asverbalformis oordinated totherootsenten e.
Futhermore, it is possible to say that there aremorphologi alexpressions,simple(assi)or omplex (as à qui), whi h ag a oming vn; these aretheintrodu ers. Forinstan ese sont adressés isintrodu edbyàqui,auraient dû by the nominativeform ils. The rst vn of the oordinatedverbalformofthe onditional sen-ten e is introdu ed by si and the vn of the rootsenten eisintrodu edbyahiddenil (ini-tial limit), assumed asrstelementof any ex-pression,asfp(nal point)isthenalone.
Ifweintrodu eanlp attheleft ofil and an rp at the right of fp, asso iate an lp to ea h introdu er and an rp to ea h introdu ed vn not oordinated with another vn, and if we asso iatean lp to the rstvn of oordinated vnsandtworptotherightofthelast oordi-natedvn, balan ed parentheses on thewhole senten eareobtained.
Thus,from(i),thefollowing(ii)isobtained. In(ii),'-'joinssingleexpressionsin(i), obtain-ing hunkexpressionswhi hare omputed,with respe ttopositionandtags,asthesimpleones. Apositionisassignedin(ii)toea hexpression jointlywithatagfromaveryrestri ted vo ab-ularyV ={int,v, v1, v2,il,fp,ot}.
Besidesil and fp, presentedearlier,int inV is asso iated to introdu ers, v is asso iated to vnsnotimmediatelypre eededeitherby','or bya oordinationform,v1 isasso iatedtovns immediatelypre eededby','v2 isasso iatedto vnsimmediatelypre eededbya oordination form(aset, ou...) notpre eededbya',',andot (other)isasso iatedtoanyexpressionwhi his notasso iatedtooneoftheprevioustags. INT willspellbothint andil.
ii ((il <0;il> (Si <1;int> les <2;ot> parents <3;ot> (s'-étaient-mis-d'a ord <4;v> hier <5;ot> et-avaient-bien- onnu <6;v2> )) la <7;ot> réglementation <8;ot> , <9;ot> les <10;ot> bureau rates <11;ot> (à-qui <12;int> ils <13;ot> se-sont-adressés <14;v> ) aujourd'hui <15;ot> ne-leur-auraient-pas-répondu <16;v> ) (que <17;int> la <18;ot> hose <19;ot> était <20;v> )impossible <21;ot> , <22;ot> (ils <23;int> auraient-dû <24;v> ) présenter <25;ot> leur <26;ot> dossier <27;ot> autrement <28;ot> pf)
IfweeliminateNL expressions,leaving only
'...',spellingintervals,substitutesforots.
iii ((int 0 (int 1 ot 2 ot 3 (v 4 ot 5 v2 6 ))ot 7 ot 8 ot 9 ot 10 ot 11 (int 12 ot 13 v 14 ) ot 15 v 16 ) (int 17 ot 18 ot 19 v 20 )ot 21 ot 22 (int 23 v 24 )ot 25 ot 26 ot 27 ot 28 fp) iv ((int 0 (int 1 ...(v 4 ...v2 6 ))...(int 12 ...v 14 ) ...v 16 )(int 17 ...v 20 )...(int 23 v 24 )...fp)
In (iv), besides intervals, there are INTs and vns (i.e. v, v2) ea h in some position 0 to n. These relations an be expressed by pairs<i, j>, where i 6=j andi, j 0. These pairswillbeassignedto dierentsets.
Bygeneral onvention,qisthepositionofthe vn introdu ed bysomeINT, i.e. the non o-ordinatedvnasso iatedtothe')'whi h loses theasso iated'(', ortherst vn in a oordi-nated hain of vns, oordinated hain whi h losestheINT.IfINT=il, weexpressthe re-lationby<q,0>,ifINT6=il andinpositionp, by<p,q>.
Closingpairs(both<q, 0>and<p,q> ) are inthesetCl[osing℄. From<q,0>2Cl we an dedu e that q 2 R[oot℄, where R is either an emptyset(see3.3)orasingletonsetwiththe positionoftherootvn asmember.
Coordinated vns in verbal phrases, whi h are in hains vn
w1
...vn wn
, are denoted by oordination pairs<w1, wi>, where wi 6= w1. Coordinationpairsofverbalphrasesarein the setC-sv. Avn in positionq and losingsome int 6= il, an be the verbalform of asenten e oordinatedtotherootsenten e(e.g. auraient-dû
<24;v>
in(ii)). Inthis ase,wewrite<q,0> andthepairbelongstotheC-r set. Withthese onventions,from(iv)weobtain(v).
(v) Cl = {<16, 0>, <1, 4>, <12, 14>, <17,20>,<23,24>}
C-sv={<6,4>} R={16}
C-r={<24,0>}
Cl,C-svandC-r beingsetsofpairs,fromthe unionofthem itispossibletodedu eapartial graph,positionsin theinputbeingitsverti es.
Theparsingsystemthatobtainstheelements in(v),giventheinputexpressionin(i),is orga-nizedinModulesIandII.ModuleI,su in tly presented here, has as input a hain of As ii
with one or more segmented and enumerated senten es. An interfa e obtains (iii) from (ii). The hallenge of Module I is the disambigua-tionof expressions su h assi or la juge whi h anbe ornot ints or vns, respe tively. It is obtained by exploring lo al ontexts. Module II is expressed in twodierent ways. There is analgorithm(Algof- )whi h from(iii)obtains (v). Theother way is aplain de larativeone, makinguseofCHR onstraints.
3 CHR onstraints
CHR (Frühwirth and Abdennadher, 2003) is a very powerful multiset rewriting language. Constraints, viewed as pie es of partial infor-mation,areformalizedasdistinguished, prede-nedpredi atesinrst-orderpredi atelogi .
A onstraintprogram su essively generates onstraintsasit runs,untilasolutionisfound to the problem or no more onstraints an be generated. Rulesdes ribehowtogeneratenew onstraintsfrom those alreadygenerated. For instan e,we anviewsymbolsinagrammaras onstraintsupon word boundariesin an input string. Thus an interesting morning ould be parsedbyCHRrulessu h as:
(1) an(X,Y) ==> det(X,Y).
(2) interesting(X,Y) ==> adj(X,Y). (3) morning(X,Y) ==> noun(X,Y). (4) start ==> an(1,2),
interesting(2,3), morning(3,4). (5) det(X,Y), adj(Y,Z), noun(Z,W) ==> np(X,W).
The ontiguityandorderofthedet,adj and nounareensuredbythewordboundaries;e.g., thedet ends where theadj starts,at pointY. GC is the set of all the generated onstraints derivablefromtheinputstringdenedthrough (1) to (4). It will be generated upon the query:?- start. In GC, we an then sele t thosethat solvetheproblem weare interested in(inthis asenp(1,4),whi htellusourstring analysesintoanp).
3.1 CHR onstraintsand grammar rules
CHR rules an dire tlymirror grammarrules: (5)in3mirrorsnp ! det adj noun, assum-ing ontiguity between det adj noun. CHR rules generate onstraints, bottom upand left
i j
allvn (i.e. v, v1, v2) aninstantiate vn i
or vn
j
. Forinstan e,assumingh totheleftofi:
(1) If vn i
= v2 and it oordinates with vn h =vorv1,thenvn j 6=v2. (2) If vn i
= v2 and it does not oordi-nate with vn h , or if it oordinates to a vn h =v2,thenvn j maybeav2 .
Thegrammarwhi hmirrorsCHRRulesisthus agrammaroftype1intheChomskyhierar hy
1 . TheformatoftheinputofCHRrulesis
il...x 1 ...x n ...x m ...fp wherex i
iseitheranintoravn,and'...' is, here, either an interval ore(mpty). The basi hallenge of CHR rules is to spe ify the on-straints in GC from whi h ar s in the partial graph anbeobtained.
3.2 Ar s from onstraints
Thewhole set GC is notneeded forobtaining ar s. GC is thus the domain of partial fun -tionsspe iyingpairsoftheresultinggraphin its range. As an illustration onverbal oordi-nation, onsideraregardé,regarde etregardera etableau,with(vi)asitsCHR-rulesinput.
(vi) v(1,2), v1(2,3), v2(3,4), ot(4,5), ot(5,6).
Severalgrammarrulesspe ifydierenttypes ofverbal oordination,twoofthemunderlying thespe i ationoftheC-svsetrelatedto(vi): vO 1! vv1,whi h oordinatesv1 tov ob-tainingvO 1
vO !vv1 v2,whi h oordinatesv1 andv2 tov obtainingvO
TheCHRrulesobtaintheGC (vii)from(vi).
(vii) v(1,2), v0(1,2), v1(2,3), v1R(2,3), v0 1(1,3), v1NT(2,3), v(1,2), v2(3,4), v2R(3,4), v1 (2,4), v0 (1,4), ot(4,5), ot(5,6), otR(4,6), v(2,5), v(1,5), ? yes
All the informations in (vii) are not needed forobtainingelementsinC-sv. Forinstan e, in-tervals,expressedbyotR,arenotsigni antfor
1
partialfun tions with GC asdomain, wehave F1 andF2. Given(vii), <2,1>,<3,1>2C-sv areobtainedbyF1 andF2, respe tively.
F1: v0 1(X,Z),v(X,Y), v1R(Y,Z)2GC !<(Z-1),X>2C-sv
F2: v0 (X,W),v0 1(X,Z),v2(Z,W)2GC !<Z,X>2C-sv
Anotherexampleillustratestheobtentionof the Cl set. Consider the embedded senten es (dit) que 1 la 2 lle 3 que 4 Ja ques 5 a-regardée 6 est-partie 7
with(viii)asitsinputtoCHRrules.
(viii) int(1,2),..., int(4,5),..., v(6,7), ...,v(7,8)
(1)and(2)in (ix) ompa tseveralgrammar rules. In (1), X is f, ( onstituant fermé), or e, andiNT rewrites asint ot*, ot* expressing intervalsore. In(2), Y expli it ontextual re-stri tions(see3.1), whileZ rewrites v ( om-plexeverbal),obtainingterminalstrings(avn oravn oordination)withaninitial vn.
(ix) (1) f !iNT X v (2) v/Y !Zot*
There are CHR rules whi h mirror (ix). In (x),(1)isasubsetoftheGCobtainedbythem, (2)is thepartialfun tion F3. Given(viii),(1) and(2),(3)isobtained.
(x) (1){ f(x,y), v(y,z)}
(2)F3: f(X,Y), v(Y,Z)2GC !<X,Y>2Cl
(3)<4,6>,<1,7> 2Cl
3.3 Dedu ed informationon intervals
From ar s obtained by partial fun tions from GC, besides the possibility of expressing the semanti representationofverbal oordination, otherinterestinginformations anbededu ed. Forinstan e,if<0,p>62Cl,thenR={},and itislikelythattheexpressionwillbeanominal phrase,even withoneormoreembedded rela-tives. Furthermore, dedu ed parentheses asso- iatedtoparti ularinputsymbolsspe ify inter-valswithinherentrestri tions,asin(ix).
(xi) (1)(int i ...(int j (2) (int i
...VFORM, where VFORM is vn j )or ((vn j (3)vn i ...vn j
avn insomepositionto therightof position j,whilethatisnotthe asewith(2)or(3).
4 Ongoing work and dis ussion
Ongoing work relates to the extension of ver-bal oordinations, to the improvement of the expressive power of the grammar (today with lessexpressivepowerthanAlgof- , see3) and itsCHR implementation and, last but notthe least,to theevaluationinee tivetexts ofthe underlying linguisti hypothesis, knowing be-forehand that neither all ints or all vns an beobtainedbyModuleI,norallverbal oordi-nationbehandledbyModuleII.
Evenifweknowthis, we laim that, in gen-eral,from poorand lo alinformationobtained byModuleI,thankstothededu tible hara ter ofbalan edparenthesesinNLtexts,itis possi-bleto obtainapartial graphof thewhole sen-ten e, with good and ee tive approximations in the NL-softwareengineering domain. From this, besides verbal oordination, it is possi-bletodedu e,inturn,restri tionsonintervals, whi h willredu etheparsingresear hspa e
2 .
Referen es
Luísa Coheur, Nuno Mamede, and Gabriel G. Bès. 2003. Asde opas: asynta ti -semanti interfa e. InEPIA'03 Workshop on Natural Language and Text Retrieval, Evora (Portu-gal).
T. Frühwirth and S. Abdennadher. 2003. Essentials of Constraint Programming. SpringerVerlag.
Jaako Hintikka and Gabriel Sandu. 1997. Game theoreti al semanti s. In van Jo-han Benthem andAli e ter Meulen,editors, HandbookofLogi andLanguage,pages361 410.Elsevier.
Jaako Hintikka. 1994. Fondements d'une théoriedulangage. PUF.
Ri hard Montague. 1974. Universalgrammar. InRi hardThomason,editor,Formal Philos-ophy, sele ted papers of Ri hard Montague, pages222246.YaleUniversityPress.
2
The onje tureisthattheanalysisofintervals will obtain new ar s and verti es, and from them anoriented graphonthesenten e, fromwhi h,in turn, semanti fun tions will obtain the semanti representation, f. (Coheur etal., 2003), but, fol-lowing (Hintikka,1994)and (HintikkaandSandu, 1997),wedonot laim that frombalan ed