HAL Id: hal-00614661
https://hal.archives-ouvertes.fr/hal-00614661
Submitted on 19 Dec 2011
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Cross-framework Grammar Engineering using Constraint-driven Metagrammars
Denys Duchier, Yannick Parmentier, Simon Petitjean
To cite this version:
Denys Duchier, Yannick Parmentier, Simon Petitjean. Cross-framework Grammar Engineering using
Constraint-driven Metagrammars. 6th International Workshop on Constraint Solving and Language
Processing (CSLP’11), Sep 2011, Karlsruhe, Germany. pp.32-43. �hal-00614661�
Constraint-driven Metagrammars
DenysDuhier,YannikParmentier,andSimonPetitjean
LIFO,Universitéd'Orléans,F-45067OrléansCedex2,Frane,
firstname.lastnameuniv-orleans. fr,
WWWhomepage:http://www.univ-orleans.fr/li fo/
Preprint
Abstrat. Inthis paper, we presentanabstrat onstraint-drivenfor-
malism for grammar engineering alled eXtensible MetaGrammar and
showhowtoextendittodeal withross-frameworkgrammarengineer-
ing. Asa ase study,we fous on the designof tree-adjoining, lexial-
funtional,andpropertygrammars(TAG/LFG/PG).
A partiularly interesting featureof this formalismis that it allows to
applyspeionstraintsonthelinguististruturesbeingdesribed.
Keywords: omputationallinguistis,formalgrammar,metagrammar,
onstraintsolving.
1 Introdution
Many grammatial frameworks have been proposed over the last deades to
desribethesyntaxofnaturallanguage.Amongthemostwidelyused,onemay
ite Tree-Adjoining Grammar (TAG) [1℄, Lexial-Funtional Grammar (LFG)
[2℄, or Head-driven Phrase StrutureGrammar(HPSG) [3℄. These frameworks
presenttheoretialandpratialinterests.Fromatheoretialpointofview,they
provide a formal devie for the linguist to experiment with her/his theories.
Fromapratialpoint of view,theymakeitpossibleto automatially proess
naturallanguageinappliationssuhasdialogsystems,mahinetranslation,et.
They dierin their expressivityand omplexity. Some reveal themselves more
adequateforthedesriptionofagiven languagethanothers. Still,formanyof
these frameworks,large resoures(i.e., grammars) havebeen designed,at rst
byhand,and laterviadediated tools(e.g.,integrated grammarenvironments
suhasXLEforLFG[4℄).Inthispaper,weareonernedwiththisomplextask
of grammar engineering, keepingin mind thetwoabove-mentionedtheoretial
andpratialinterests.
Severalapproaheshavebeenproposedforaomputer-aidedgrammarengi-
neering,mainlytoreduetheostsofgrammarextensionandmaintenane.The
main approahesare 1. theautomati aquisition from treebanks (see e.g., [5℄
forLFG),2.systemsbasedonanabstratdesriptionofthegrammar,eithervia
transformationrules,alsoknownasmetarules(seee.g,[6℄forTAG)orviaade-
sriptionlanguage,sometimesalled metagrammar(see e.g.,[7℄forTAG).The
advantage of the desription-based approah (and espeially metagrammars )
overtheautomatiaquisitionapproahliesinthelinguistiontrolitprovides.
Indeed,thesedesriptionsapturelinguistigeneralizationsandmakeitpossible
toreasonaboutlanguageatanabstratlevel.Desribinglanguageatanabstrat
levelisnotonlyinterestingforstruturesharingwithin agivenframework,but
alsoforinformationsharingbetweenframeworksand /orlanguages.
This observation was already made by [9,10℄. In their papers, the authors
showedhowto extendanexisting metagrammarforTAGso that bothaTAG
and anLFG ouldbegenerated from it.TheyannotatedTAGmetagrammati-
alelementaryunits(so-alledlasses)withextrapieesofinformation,namely
(i) LFG's funtional desriptions and (ii) ltering information to distinguish
ommonlassesfromlassesspeitoTAGorLFG.Themetagrammarompi-
lationthengenerated anextended TAG,fromwhihLFGruleswereextrated.
TomaximizethestruturesharingbetweentheirTAGandLFGmetagrammars,
theauthors denedlassesontainingtree fragmentsofdepth one.Thesefrag-
mentswereeitherombinedtoprodueTAGtreesorassoiatedwithfuntional
desriptions to produe LFG rules. This ross-framework experiment was ap-
pliedtothedesignofaFrenh/Englishparallelmetagrammar,produingboth
a TAG and a LFG. This work was still preliminary. Indeed (i) it onerned
a limited metagrammar(the target TAG was omposed of 550 trees, and the
assoiated LFG of 140rules) (ii) moreimportantly, there is no lear evidene
whether ageneralizationtootherframeworksand/orlanguagesouldbepos-
sible(metagrammarimplementationhoies, suhastreefragmentdepth,were
notindependentfrom thetargetframeworks).
Here,wehosetoadoptamoregeneralizedapproahbydesigninganextensi-
blemetagrammatiallanguage,thatanhandleanarbitrarynumberofdistint
targetframeworks.Thelinguistanthususethesameformalismtodesribedif-
ferentframeworksandgrammars.Nonetheless,ifonewantstoexperimentwith
multi-formalism, e.g., by designing a parallel TAG / LFG grammar, nothing
prevents her/him from dening universal lasses, whih ontain metagram-
matial desriptionsbuilt onaommon sublanguage. Ratherthan designinga
newmetagrammatiallanguagefrom srath,weproposetoextendanexisting
formalism,namelyeXtensible MetaGrammar(XMG)[11℄,whihseemspartiu-
larlyadequatethankstoitsmodularityandextensibility.
The paper is organizedas follows. In setion 2, we briey introdue TAG,
aswellastheredundanyissues raisingwhile developinglargeTAGgrammars
(whihmotivatedmetagrammars).WethenintroduetheXMGmetagrammat-
ial languageand show how it an be used to design TAG grammars. In se-
tion3,webriey introdue LFGand presentanextensionof XMGto desribe
LFGgrammars. In setion 4,we introdueProperty Grammar(PG) [12℄, and
presentaseondextensionofXMGtogeneratePGgrammars.Insetion5,we
willgeneralizeoverthesetwoextensions,anddenealayoutforross-framework
grammarengineering.Finally,weonludeandgiveperspetivesin setion6.
1
Inrule-baseddesriptions,onehastoarefullydenetheorderingoftheappliations
ofrules[8℄,whihmakesithardtodesignlargegrammars.
Grammars with a metagrammar
2.1 Tree-AdjoiningGrammar
TAG 2
is atree rewriting system,where elementary treesan be ombinedvia
tworewritingoperations,namelysubstitutionandadjuntion.Substitution on-
sistsinreplaingaleafnodelabelledwith↓withatreewhoseroothasthesame
syntatiategoryasthisleafnode.Adjuntiononsistsinreplainganinternal
node witha treewhere boththeroot nodeand one of theleafnodes (labelled
with ⋆) havethesamesyntatiategoryasthis internal node. As anillustra-
tion,onsider Fig.1below.Itshows(i) thesubstitutionoftheelementarytree
assoiatedwiththenounJohnintotheelementarytreeassoiatedwiththeverb
sleeps,and(ii)theadjuntionoftheelementarytreeassoiatedwiththeadverb
deeplyinto thetreeassoiatedwithsleeps.
S
NP↓ VP VP
NP V VP⋆ ADV
John sleeps deeply
→
S
NP VP
John VP ADV
V deeply
sleeps
(derivedtree)
Fig.1.TreerewritinginTAG
Basially,areal sizeTAGismadeof thousandsofelementarytrees[14,15℄.
Due to TAG's extendeddomain ofloality,many ofthese treesshare ommon
sub-trees,asforinstanetherelationbetweenaanonialsubjetandits verb,
as shownin Fig. 2.To dealwith thisredundany, themetagrammar approah
(in partiular XMG) proposes todesribelargeTAG grammarsin an abstrat
andfatorizedway.
2.2 eXtensible MetaGrammar(XMG)
XMGisametagrammatiallanguageinspiredby logiprogramming.Theidea
behindXMGisthatametagrammarisadelarativelogialspeiationofwhat
agrammaris.Thisspeiationreliesonthefollowingthreemainonepts:
•severaldimensions oflanguage(e.g.syntax,semantis)anbedesribed;
•foreah ofthese dimensions,desriptionsare madeofnon-deterministi om- binationsofelementaryunits;
•forsomeofthesedimensions,desriptionsmustbesolvedtoproduemodels.
2
S
N↓ V⋄ N↓
Jeanmangeunepomme
Johneatsanapple
N⋆ S
C
que
S
N↓ V⋄
LapommequeJeanmange
TheapplethatJohneats
Fig.2.StruturalredundanyinTAG
XMG'sextensibility omes from theonept of dimensions.These allow to
desribeanarbitrarynumberoftypesoflinguististrutures.Non-determinism
allowsforfatorization,anddesriptionsolvingforassemblyandvalidation(i.e.,
well-formednessof thedesriptionaording tosome targetframework). Here-
after,wewillrstuseXMGtodesribeTAG.Then,wewillapplyXMG'sexten-
sibilitytothedesriptionofotherframeworks,namelyLFGandPG.Eventually,
wewillgeneralizeoverthese appliations.
WhendesribingTAGtreeswithXMG,onedenesboth(i)treefragments
and (ii) onstraintsthat express how these fragments haveto be ombined to
produethegrammar.Twolanguagesarethusused:adesriptionlanguageLDto
speifyfragments,andaontrollanguageLCtospeifyombinationonstraints.
LD isbasedonthepreedeneanddominanerelations.Furthermore,sine TAGallowsforthelabellingofsyntatinodeswithfeature strutures,sodoes
LD.A desriptionin LD isaformulabuiltasfollows:
Desc :=x→y | x→+y | x→∗y | x≺y | x≺+y | x[f:E] | x(p:E) | Desc ∧ Desc
where x, y referto node variables, →(resp. ≺) to thedominane (resp.pree-
dene)relation,and+(resp.∗)areusedtodenotethetransitive(resp.reexive
andtransitive)losureofthisrelation.Thesquarebraketsareusedtoassoiate
anodevariablewithsomefeaturestruture.Parenthesisareusedtoassoiatea
nodevariable withsomeproperty(suhastheTAG⋆propertyseenin Fig.1).
Notethatnodevariablesarebydefault loaltoadesription.Ifanodevariable
needsto beaessedfrom outsideitsdesription,itis possibleto usesomeex-
portmehanism.Oneavariableis exported,it beomesaessibleusing adot
operator. For instane, to refer to the variable xin thedesription Desc, one
writesDesc.x.Hereisanillustrationofafragmentdesriptionin XMG(onthe right,oneanseeaminimalmodelofthisdesription):
(x[cat:S] →y[cat:V] ) ∧ (x→z(mark:subst) [cat:N] ) ∧ (z ≺ y)
x[at:S℄
z↓[at:N℄ y[at:V℄
LC oersthreemehanismstohandlefragments:abstrationviaparameter-
izedlasses(assoiationofanameandzeroormoreparameterswithaontent),
umulationofontents).AformulainLC isbuiltasfollows:
Class := Name[p1, . . . ,pn]→Content
Content := Desc | Name[. . .] | Content∨Content | Content∧Content
As an illustration of LC, let us onsider dierent objet realizations. One ouldforinstanedenethe4fragments:(i)anonialsubjet,(ii)verbalmor-
phology, (iii)anonial,and (iv)relativizedobjet, andthefollowingombina-
tions,thus produingthetwotreesofFig.2:
Object →CanObj ∨ RelObj
Transitive →CanSubj ∧ VerbMorph ∧ Object
Metagrammarompilation. ToprodueagrammarfromanXMGmetagrammar,
weletthelogialspeiationgenerate,inanon-deterministiway,desriptions.
Inother words,theombination onstraintsareproessed to generatedesrip-
tions(oneperdimension).Forsomedimensions,desriptionsneedtobesolved
to produe models. This is thease for TAG,a onstraint-basedtree desrip-
tion solveris thus usedto omputetrees [11℄. Note this solveratually heks
several typesof onstraints[16℄: treewell-formednessonstraints,TAG-related
onstraints(e.g.,uniquenodelabelled⋆),andlanguage-relatedonstraints(e.g., uniquenessand orderoflitisinFrenh).
AsoneoftherstambitionsofXMGismulti-formalism,dimensionsarean
eientwayto denedierenttypesof desriptionlanguageadaptedto target
frameworks.Letus seehowtodenedimensionsforLFGandPG.
3 Generating Lexial-Funtional Grammars with a
metagrammar
3.1 Lexial-Funtional Grammar
Alexial-funtionalgrammar(LFG)onsistsofthreemainomponents:1.ontext-
freerulesannotatedwith funtionaldesriptions,2.well-formednesspriniples,
and 3. a lexion. From these omponents, twomain interonneted strutures
an be built 3
: a (onstituent)-struture, and a f(untional)-struture. The -
struturerepresentsasyntatitree, andthef-struturegrammatialfuntions
in the form of reursiveattribute-value matries. Asan exampleof LFG, on-
sidertheFig.3below.Itontainsatoygrammarandthe-andf-struturesfor
thesenteneJohnlovesMary.Inthisexample,oneanseefuntionaldesrip-
tionslabellingontext-freerules(see(1) and(2)).Thesedesriptionsaremade
ofequations.Forinstane,inrule(1),theequation(↑SU BJ) =↓onstrainsthe SU BJ feature of the funtional desription assoiatedwith the left-hand side
oftheontext-freeruleto unifywiththefuntionaldesriptionassoiatedwith
3
Thisonnetionisoftenreferredtoasfuntionalprojetionorfuntionalmapping.
tionsareuniation onstraintsbetweenattribute-valuematries. Nonetheless,
theseonstraintsmaynotprovideenoughontrolonthef-struturesliensedby
thegrammar,LFGheneomeswiththreeadditionalwell-formednesspriniples
(ompleteness,ohereneanduniqueness)[2℄.
Toygrammar:
(1)S → NP VP
↑=↓ (↑SU BJ) =↓ ↑=↓
(2)VP → V NP
↑=↓ ↑=↓ (↑OBJ) =↓
(3)John NP,(↑P RED) =′JOHN′,(↑N U M) =SG,(↑P RES) = 3
(4)Mary NP,(↑P RED) =′M ARY′,(↑N U M) =SG,(↑P RES) = 3
(5)loves V,(↑P RED) =′LOV Eh(↑SU BJ) (↑OBJ)i′,(↑T EN SE) =P RESEN T
-struture: f-struture:
S
↑=↓
NP
(↑SU BJ) =↓
VP
↑=↓
John
V
↑=↓
NP
(↑OBJ) =↓
loves Mary
f1:
PRED 'LOVE
D(↑SU BJ) (↑OBJ)E
'
SUBJ f2:
PRED 'JOHN'
NUM SG
PERS 3
OBJ f3:
PRED 'MARY'
NUM SG
PERS 3
TENSE PRESENT
Fig.3.LFGgrammarand-andf-struturesforthesenteneJohnlovesMary
3.2 Extending XMGfor LFG
In the previous setion, we dened the XMG language, and applied it to the
desriptionofTAG.Letusreallthatoneofthemotivationsofmetagrammars
ingeneral(andofXMGinpartiular)istheredundanywhihaetsgrammar
extensionandmaintenane.InTAG,theredundanyishigherthaninLFG.Still,
asmentioned in [9℄, in LFG there are redundanies at dierent levels, namely
within the rewriting rules,the funtionalequations and thelexion.Thus, the
metagrammarapproahan provehelpful inthisontext.Letus nowsee what
typeoflanguageouldbeusedto desribeLFG.
4
TodesribeLFGat an abstrat level, oneneedsto desribeitselementary
units,whih areontext-freerulesannotatedwithfuntionaldesriptions(e.g.,
equations)andlexialentries usingattribute-valuematries.Context-freerules
4
AspeiationlanguageforLFGhasbeenproposedby[17 ℄,butitorrespondsmore
toamodel-theoretidesriptionofLFGthantoametagrammar.
XMGusingadesriptionlanguagesimilartotheoneforTAG,i.e.,usingthe→
(dominane)and≺(preedene)relations.Oneanforinstanedenedierent
ontext-free bakbones aordingto the numberof elementsin theright-hand
sidesoftheLFGrules.ThesebakbonesareenapsulatedinparameterizedXMG
lasses,wheretheparametersareusedtoassignasyntatiategorytoagiven
elementoftheontext-freerule,suhasinthelassBinaryRulebelow.
BinaryRule[A, B, C] → (x[cat:A]→y[cat:B])∧(x→z[cat:C])∧(y≺+z)
exports hx, y, zi
Wealsoneedtoannotatethenodevariablesx, y, zwithfuntionaldesriptions.
Letusseehowthese funtionaldesriptionsFDescarebuilt:5
Fdesc := ∃(g F EAT)| ¬(g F EAT)| (g∗F EAT)| (g F EAT) CON ST V AL| Fdesc∨Fdesc | (Fdesc) | Fdesc∧Fdesc
where g refers to an attribute-value matrix, F EAT to a feature, V AL to a
(possiblyomplex)value,CON STtoaonstraintoperator(=foruniation,=c
foronstraininguniation,∈forsetmembership,6=fordierene),(FDesc)to
optionality,and∗toLFG'sfuntionalunertainty.Notethatganbeomplex,
that is, it an orrespond to a (relative using ↑ and ↓ or absolute) path
pointingtoasub-attribute-valuematrix.
Tospeifysuhfuntionaldesriptions,weanextendXMGinastraightfor-
wardmanner,withadediateddimensionandadediateddesriptionlanguage
LLF G denedasfollows:
DescLFG := x→y | x≺y | x≺+y | x = y | x[f:E] | xhFdi | DescLFG ∧ DescLFG
Fd := g| ∃g.f | g.f=v| g.f =cv| g.f ∈v| ¬Fd| Fd∨Fd| (Fd)| Fd∧Fd
g,h := ↑ | ↓ | h.f | f ∗i
where g, harevariablesdenotingattribute-valuematries, f, i(atomi)feature
names,v (possiblyomplex) values,andh. . .iorrespondsto LFG'sfuntional
mappingintrodued above. With suh alanguage,it nowbeomes possible to
deneanXMGmetagrammarforourtoyLFGasfollows.
6
Srule →br=BinaryRule[S,NP,VP] ∧br.xh↑=↓i ∧br.yh(↑.SUBJ) =↓i
∧ br.zh↑=↓i
V P rule →br=BinaryRule[VP,V,NP]∧ br.xh↑=↓i ∧br.yh↑=↓i
∧ br.zh(↑.OBJ) =↓i
5
Wedonotonsiderhereadditional LFGoperators, whihhavebeenintroduedin
speiLFGenvironments,suhasshue,insertorignore,et.
6
Here, we do not desribe the lexial entries, these anbe dened using the same
languageastheLFGontext-freerules,omittingtheright-and-side.
done,letushavealookataslightlymoreomplexexampletakenfrom [9℄:
V P → V (N P) P P (N P)
↑=↓ (↑OBJ) =↓ (↑SecondOBJ) =↓ (↑OBJ) =↓
Here,wehavetwopossiblepositionsfortheNPnode,eitherbeforeorafterthe
PP node.SuhansituationanbedesribedinXMGasfollows:
V P rule2 →br=BinaryRule[VP,V,PP] ∧ u[cat:NP] ∧ br.y≺+u
∧ br.yh↑=↓i ∧ br.zh(↑.SecondOBJ) =↓i ∧ uh(↑.OBJ) =↓i
Here, we do not speify the preedene between the NP and PP nodes. We
simply speify that the NP node is preeded by the V node (denoted by y).
When ompiling this desription with a solver suh as the one for TAG, two
solutions(LFG rules)will beomputed.Inother terms,theoptionalityanbe
expresseddiretlyatthemetagrammatiallevel,andthemetagrammarompiler
andiretlyapplyLFG'suniqueness priniple.
Inotherwords,themetagrammarherenotonlyallowsforstruturesharing
via the (onjuntive or disjuntive) ombination of parameterized lasses,but
it also allows to apply well-formedness priniples to the desribed strutures.
In the example above with the two NP nodes, this well-formedness priniple
is heked on the onstituent struture and indiretly impats the funtional
struture(whih isthestruture onernedwiththese priniples).Ifweseethe
funtionalstruturesasgraphsandequationsasonstraintsonthese,oneould
imagine to develop a spei onstraint solver. This would allow to turn the
metagrammar ompiler into an LFG parser, whih would, while solving tree
desriptions for theonstituent struture, solvegraph-labelling onstraints for
thefuntionalstruture.
Note that asimilar approah of struture sharing within an LFG through
ombinationsofelementaryunitshasbeenproposedby[18℄.Intheirpaper,the
authorsdesribehowtoshare informationbetweenLFGstrutures bydening
nameddesriptions,alledtemplates.Thesetemplatesanabstratoveronjun-
tionordisjuntionoftemplates,theyarethusomparabletoourmetagrammar
lasses. The main dierene with our approah, is that nothing is said about
aninterpretationofthesetemplates (theyatinamaro-likefashion),whilein
XMG,oneouldapplysomespeitreatments(e.g.onstraintsolving)onthe
metagrammarlasses.
4 Generating Property Grammars with a metagrammar
4.1 Property Grammar
PropertyGrammar(PG)[12℄ diersfromTAGorLFGin sofarasitdoesnot
belong tothe generativesyntaxfamily,but to the model-theoretisyntaxone.
In PG, one denes the relations between syntati onstituents not in terms
of rewriting rules, but in termsof loal onstraints (the so-alled properties).
The properties liensed bythe framework relyon linguisti observations, suh
aslinearpreedenebetweenonstituents,oourreny,mutualexlusion,et.
Here,wewillonsiderthefollowing6properties,thatonstraintherelations
betweenaonstituent (i.e., thenodeof asyntatitree), withategoryA and
itssub-onstituents(i.e., thedaughter-nodesofA):
8
Obligation A:△B at leastoneB hild
Uniqueness A:B! at mostoneB hild
Linearity A:B≺C B hildpreedesC hild
Requirement A:B⇒C ifaB hild,thenalsoaC hild
Exlusion A:B6⇔C B andC hildrenaremutuallyexlusive
Constitueny A:S hildrenmusthaveategoriesinS
InarealsizePG,suhastheFrenhPGof[19℄,thesepropertiesareenapsulated
(togetherwithsomesyntatifeatures)within linguistionstrutions,andthe
latterarrangedin aninheritane hierarhy 9
.Anextrat ofthehierarhyof[19℄
ispresentedinFig.4(fragmentorrespondingto basiverbalonstrutions).
V(Verb)
INTR
ID|NATURE
h
SCAT
1
.SCAT
i
onst.:V:
1
CATV
SCAT¬(aux-etre∨aux-avoir)
V-n(Verbwithnegation)inheritsV
INTR
SYN
NEGA
"
RECT
1
DEP Adv-n
#
uniqueness: Adv-ng
Adv-np
!
requirement: 1 ⇒Adv-n
linearity:Adv-ng≺ 1 :Adv-ng≺Adv-np
:Adv-np≺ 1.[M ODE inf]
: 1.[M ODE ¬inf]≺Adv-np
V-m(Verbwithmodality)inheritsV;V-n
INTR
SYN
INTRO
"
RECT
1
DEP Prep
#
uniqueness:Prep!
requirement: 1 ⇒Prep
linearity: 1 ≺Prep
Fig.4.FragmentofaPGforFrenh(basiverbalonstrutions)
LetusforinstanehavealoserlookatthepropertiesoftheV-nonstrution
ofFig.4.ItsaysthatinFrenh,forverbswithanegation,thisnegationismade
7
Aninterestingharateristioftheseonstraintsisthattheyanbeindependently
violated,andthusprovideawaytoharaterizeagrammatialsentenes.
8
Here,weomitlexialproperties,suhascat(apple) =N .
9
Note that this hierarhy is a disjuntive inheritane hierarhy, i.e., whenthere is
multipleinheritane,thesublassinheritsoneofitssuper-lasses.