Perpectives for Natural Language Processing
Jordi Alvarez 1
Abstract. Thispositionpaperintroducesageneralframe-
workfor theory learning basedon probabilistic Description
Logics(DL).Theformalbaseoftheframeworkisbrieyde-
scribed.Thepaperalsoarguesthatthegeneraltheorylearn-
ingapproachissuitabletobeusedforNLPtasks;anditcan
bespeciallyusefultoinvestigate interrelationsamongdier-
enttypesofnaturallanguageinformationinordertoproduce
betterNLPsystems.
1 Introduction
Inthispaper,Iadvocateforageneraltheorylearningframe-
work.Therepresentationformalismusedinourframeworkis
anextensionofDescriptionLogics(DL)thatallowstoexpress
probabilistic knowledge.A generallearning procedure based
ontheprevious frameworkhasbeenimplementedaspart of
a system called YAYA [2 ]. This learning procedure is a re-
lationallearning procedurethatperformscomputationsvery
similartothose performed byInductiveLogicProgramming
(ILP)systems.
Thelearningprocedureallowstheusageoflargetaxonomies
as background knowledge. The main goal of the learning
system can thenbe seen as that of ontology acquisition or
completion (ifaninitial ontology isavailableas background
knowledge).
2 Knowledge Representation
A probabilistic extensionof DescriptionLogics(DL)is used
astheunderlyingknowledgerepresentationandreasoningfor-
malism.DLdistinguishesbetweenintensionalandextensional
knowledge. Itprovides aconcept language Lthat isusedto
buildconceptexpressions. Aknowledgebase isdenedas
a pair <T;A >where T is called TBox and consists of a
setofaxiomsthatprovidetheintensionalknowledge; andA
is called ABox and consists of a set of assertions about in-
dividualobjects thatprovidetheextensional knowledge. DL
semanticsisdenedinamodel-theoreticway.Thereadercan
check[7 ]fordetailsonDLformalismandconceptlanguages.
A number of concept languages appear inDL literature.
YAYA concept language (YCL) is an extension of ALCN 2
1
TALP Research Center, Universitat Politecnica deCatalunya,
Modul C6, Campus Nord, 08034 Barcelona, email: jal-
2
ALCN containstop,bottom,conjunction,disjunction,existen-
tial quantication, universal quantication, negation, and un-
qualiednumberrestrictions(for example, numberrestrictions
thatallowsinverseroles,rolecomposition,andaconstruction
equivalenttotheCLASSICSAME-ASconstruct[3 ].
Whythislanguage? Becauseintherstexperimentsdone
forNLPproblems,itseemstobethelessexpressivelanguage
thatisexpressiveenoughtosolvetheproblemonlybydening
asetofaxiomsandusingthereasoningcapabilitiesprovided
byDL 3
.
The axioms allowed by YAYA are of the form C D,
where C and D are concept expressions inL; that is, they
aregeneral conceptinclusionaxioms.So,YAYA TBoxesare
free TBoxes. This, in combination with language complex-
itywouldmakethereasoningproceduresintractable. Butin
problemsinwhichtaxonomicknowledgeisimportant,anac-
curaterepresentationofitincombinationwiththeimplemen-
tationofsomeoptimizationtechniquesforDLreasoningpro-
cedures[10 ]canmaketheoverallreasoningprocesstractable
forpracticalsituations.
2.1 Probabilistic Knowledge
Inordertorepresentprobabilisticknowledge,axiomsareex-
tendedtoamoregeneralform:C;D;whereisaproba-
bilityandisapositiverealnumber.AxiomC
;
Dstates
thattheconditionalprobabilityP(DjC)followsanormaldis-
tributioncentered in and with astandard deviation of
inthesetof modelsof theaxiom. This issomewhatsimilar
tothep-conditioning notionintroducedin[9],butintuitively
more precise by the use of the normal distribution. Unlike
[12 ],YAYAdoesnotallowprobabilisticassertions.
Model theoretic semantics on which DL is usually de-
ned has to be adapted for probabilistic axioms. For non-
probabilisticDLwecandividethesetofinterpretationsofa
knowledgebase=<T;A>intwodierentsets:thesetof
modelsof , and the rest ofinterpretations. Intuitively,for
probabilistic DL we forget about this clear distinction. For
probabilistic DL we cannot make this clear distinction be-
causegivenaninterpretationwecannotdecidewhether itis
amodelof or not.Instead, the setof axiomsinT deter-
minesaprobabilitydistributionoverinterpretations.Every
interpretationIisthenassignedaprobability.Thisproba-
bilitymaybe0ifIhasnochancetobeamodelof.Details
areoutofthescopeofthispositionpaperandcanbefound
at[1].
orroadswithlessthan3lanes).
3
Roleaxiomsalsoseeminteresting, andcaneasilybeintroduced
following[11].
tenagivenbehaviourcanbestatedwithlessandmore\nat-
ural" axioms if probabilities can take part inaxioms 4
; and
(b)Givenabehaviour tobedenedbya setofaxioms and
twoconceptsCandD,therealwaysexistsomeandsuch
that axiom C
;
D is consistent with the behaviour to
be learned(and and can be estimated from the set of
examplesgiventothelearningprocedure).
These two properties make probabilistic axioms specially
interesting for learning purposes. The rst property states
thatthetheorytobelearnedwillprobablybeeasiertolearn
ifweallowprobabilisticaxioms.Thesecondonemayprovide
apathof \intermediate"axiomsthat are completelyconsis-
tentwiththetheoryto belearnedandmayhelp tondout
real theory axioms (those \denitive" axioms may convert
intermediateaxiomsintounnecessaryones).
3 Theory Learning
Theorylearningisformalizedastheprocessofdetermininga
theory/TBoxT
fromasetofmodelsofT
.So,theexamples
usedinthelearningprocessarecompletemodels 5
.Thisdiers
from usual machine learning approaches in which examples
areindividualentities.
Inorder to dene what theorylearning is, we rstadapt
Shannon information theory notionsto DL domain. An in-
terpretation can be directlyrepresented as aset of boolean
variables:foreveryconceptnameAandeveryelementd2 I
we have a boolean variable (stating whether d 2 A I
or
d 62 A I
); and for every role name P and every pair of ele-
mentsd
1
;d
2
2
I
wehaveanotherbooleanvariable(stating
whether (d1;d2) 2 P I
). The entropy of I is dened as the
numberof boolean variables necessary to represent I when
noadditionalknowledgeisavailable.Thisis,infact,adirect
interpretationofShannoninformationtheorynotions.
The amount of information provided by a TBox T with
respect to I canbe intuitivelydened as the dierence be-
tweenthenecessarynumberofbooleanvariablestorepresent
Iwhennoadditionalknowledgeisavailable,andthethemin-
imum numberof boolean variables necessaryto represent I
when T is available 6
. Then, the learning procedure has to
maximizetheamountofinformationprovidedbyT withre-
specttoatrainingset.Anaxiomwill beinteresting forT
ifT [fgprovidesmoreinformationthanT.
ComputingtheamountofinformationprovidedbyaTBox
is intractable. Instead, some measures that provide anidea
of theinformation addedby anaxiom
1
given anotherax-
iom2 canbeeÆcientlycomputed[2].Thesemeasureshave
4
Avery simple and illustrative example is the penguin excep-
tion so widely used: the following probabilistic axiomatization
fbird0:95;0:05canfl y;penguinbird;penguin:canfl yg
(supposing probabilities are correct) issimpler than its corre-
spondingnon-probabilisticaxiomatization:fbirdu:penguin
canfl y;penguin bird;penguin : canfl yg. Yes, onlyan
antecedent isabit larger.Butthisisa quitesimpletheory;for
largertheoriesdierenceswillsurebebigger.
5
Forexample,ifweareworkinginNLP,anexamplecouldcorre-
spondtoawholesentence, withalltheinformationwewantto
learnandfromwhichwewanttolearn(syntacticgroups,syntac-
ticrelationsbetweenthem,wordsenses::: ).
6
Thiscapturesthefactthatsomeofthebooleanvariablesneces-
sarywhennoknowledge isavailablecanbededucedfromother
byalearningprocedure.
YAYAlearning procedureisnotinvolvedwithPAClearn-
abilitynorLeastCommonSubsumer(LCS)computation[4],
as mostworkin learning DLsdoes(see [5 ,8]for example).
Instead, our learning procedure is a general axiom explo-
rationprocedurethatis heuristicallyguidedbytheinforma-
tiontheorymeasuresmentionedabove.AninitialtheoryT0is
evolved 7
throughtheadditionofnewaxioms.Thisisdoneby
theapplicationofaset ofinduction rules thatproducenew
axiomsfromaxiomsalreadyexistinginthetheory.
Thereare19dierentinductionrulesthatcanbeclassied
mainlyintogeneralization,specialization,andgeneralexplo-
ration rules 8
. In addition, some of these rules are specially
designedtoworkwithtaxonomicknowledge.
Althoughthere isnospace to explain the whole 19rules,
wecansketchsomeofthemtoseehowtheywork:
expandantecedentrule It adds an existential construc-
tiontothe\end"oftheantecedentofanaxiom.
For example, suppose we are acquiring a theory about
familyrelationships from aa groupof families 9
. Suppose
1
= man
0:5;0:3
9haswife:woman 2 T
t 10
.
Then, this rule would explore axioms by adding sim-
ple existential constructions to the end of antecedent.
Amongothersit will producetheaxiomPhi
2
= manu
9haschil d:person 0:87;0:1 9haswife:woman, where the
parametersfor thedistributionaresupposedtohavebeen
inferredfromthetrainingset.
As
2
provides more information in what refers to man
havingchildsthan1,itwillbeaddedtoTt.
same-asaxiomrule It builds new axioms with SAME-
AS constructions in the consequent. Going on with the
previous example, suppose we have 3 = person u
9hasgrandparent
1;0
9hasparent:9hasparent:person;
andwehavethat32Tt.
The application of the same-as axiom rule to
3
would build an axiom 4 with the same antecedent
and a consequent constituted by a SAME-AS construct
built from the antecedent and the consequent of
3 .
That is, 4 = person u 9hasgrandparent 1;0
[SAME AShasparentÆhasparent;hasgranparent].
4 increases the amount of information ofTt,as 4 con-
sequent is morespecic than
3
one. So, YAYA learning
procedurewouldadd4 toTt.
Theiterative ruleapplication processis repeateduntilall
theaxioms\reachable"throughtheapplicationofaninduc-
tionrulearenotinterestingfor T.YAYAlearningprocedure
has been designed from practical principles. Unfortunately
thereis notheoretical boundonthe amountof information
capturedbytheTBoxresultingfromthelearningprocesswith
respecttoT
.
7
Thisinitialtheorymaybeempty,itmaycontainasubsetofthe
theorytobelearned,anditmayalsotakeintoaccountontological
knowledge.
8
ThersttwotypesaresimilartoILPgeneralizationandspecial-
izationoperators.
9
Note thatherethe examplescorrespondto familiesand notto
individuals,asitwouldcorrespondinmostILPsystems.
10
Throughoutthisexample,itissupposedthatman,woman,and
personareconceptnamesthatappearinthe trainingset,and
thataninterestingaxiommaybereachedfromalotofdier-
entsearchstates.Thisisimportant becauseinsomesenseit
reducestheamountoflocalminimainthesearchspace.And
astheproceduresketchedaboveisinfactagreedysearch,it
maybeaectedbylocalminima.
Byother hand,the YAYAlearning procedure isnot com-
plete;thatis,itdoesnotguaranteeitwillndagivenaxiom
oragiventheory.Nevertheless,ithasbeentestedsuccessfully
onsometypicalILPbenchmarkssuchasthefamilyrelation-
ships,andthechessking-rook-kingillegalpositionones[16 ].
4 YAYA and Natural Language Processing
The maingoal of YAYA isto be appliedto NLP problems.
Relationallearningsystemshavebeenshowntobewellsuited
forNaturalLanguageLearning(NLL).ILPhasbeensuccess-
fullyappliedinanumberofNLPproblems[15 ,14 ,6].
Withthispurpose,thewidelyusedWordNetlexicalontol-
ogy [13 ] has been integrated into YAYA. WordNet is auto-
maticallyconvertedintoasetofaxiomsthatcanbeused:(1)
To provide a backgroundtheory to the learning procedure;
and(2)Forreasoningpurposes(oncethelearningprocedure
hasdoneitsjob).
Despite the big amount of concepts (synsets in WordNet
terminology)inthementionedontology,acarefulselectionof
theaxiomsintowhichthetaxonomyismapped,andtheim-
plementationofsomeoptimizationtechniquesfrom[10 ]main-
tainthereasoningtaskstractable.
YAYA provides apowerful representation, reasoning, and
learning framework for NLP tasks. This has the immediate
advantagetomakepossibletoboarddierentNLPtasksusing
thesameenvironment.Butmostimportant,itprovidesaway
to experimentwith theintegration of several ofthese tasks;
and tostudytheinteractionamongthem.Additionally,and
dierentto ILP systems,YAYA hasbeendesigned specially
totakebenetfromavailabletaxonomicknowledge.
Traditionally, several phases have been dened in NLP:
morfologicalanalysis,syntacticalanalysis,wordsensedisam-
biguation, contextual resolution, conceptual representation,
etc.Thecompleteanalysisofanaturallanguagetextconsists
of solving eachone of the phases in anordered way.Then,
onephasecanuseinformationproducedbyprevious phases;
but,ofcourse,itcannotuseinformationfromlaterphasesas
ithasnotalreadybeenproduced.
Thegoalofthephasesanalysisistomaintainthecomplex-
ityoftheproblembelowcertainlimits;butitmaymisssome
importantinteractionsbetweendierentphases.ForNLLsys-
temsthismayresultinthesubstitutionofamissedinteraction
byasetofspecialsituationsthat approximateittakinginto
accountonlyinformationinpreviousphases.
ReducingNLP toanonlybigphase isnotrealisticatthis
moment. The number of interactions that should be taken
intoaccountbyalearningprocedureisprobablytoolargeto
obtaininterestingresultsinareasonableamountoftime.In-
stead,arestrictedinteractionmodelcanbeusedtoexplorein-
terestinginteractions.YAYAprovidesthenecessaryelements
toexperimentwithdierentinteractionmodels.Nevertheless,
thisworkisatitsinitialstageatthismoment.
SomeexperimentshavebeenperformedforNLP.Morecon-
roles. The whole setof restrictions implicitly present inthe
trainingsethassuccessfullybeenlearned.
REFERENCES
[1] JordiAlvarez,`Extendingthetableauxcalculusforprobabilis-
ticdescriptionlogics',TechnicalReportLSI-00-R,Universitat
PolitcnicadeCatalunya,(2000).
[2] Jordi Alvarez, `A formalframework for theory learning us-
ingdescriptionlogics',TechnicalReportLSI-00-R,Universi-
tatPolitcnicadeCatalunya,(2000).
[3] AlexBorgidaandPeterF.Patel-Schneider,`Asemanticsand
completealgorithmforsubsumptionintheCLASSICdescrip-
tionlogic',JournalofArticialIntelligenceResearch,1,277{
308,(1994).
[4] W.Cohen,A.Borgida,andH.Hirsh,`Computingleastcom-
monsubsumersindescriptionlogics',inProceedingsofAAAI-
92,pp.754{760,(1992).
[5] W.CohenandH.Hirsh,`Thelearnabilityofdescriptionlogics
withequalityconstraints',MachineLearning,17(2{3),169{
199,(1994).
[6] JamesCussens,DavidPage,StephenMuggleton,andAshwin
Srinivasan, `Using Inductive Logic Programming for Natu-
ral Language Processing', in ECML'97 { Workshop Notes
on Empirical Learning of Natural Language Tasks, eds.,
W.Daelemans,T.Weijters,andA.vanderBosch,pp.25{34,
Prague,(1997).
[7] F.M.Donini,M.Lenzerini,D.Nardi,andA.Schaerf,`Rea-
soning indescription logics',inStudies in Logic, Language
and Information, ed., G. Brewka, 193{238, CLSI Publica-
tions,(1996).
[8] M.FrazierandL.Pitt,`CLASSIClearning',MachineLearn-
ing,25,151{193,(1996).
[9] Jochen Heinsohn, `Probabilistic description logics', in Pro-
ceedingsof the10thConferenceonUncertaintyinArticial
Intelligence,Seattle,Washington,(1994).
[10] I. Horrocks and P. PatelSchneider, `Optimising description
logicsubsumption',JournalofLogicandComputation,9(3),
267{293,(1999).
[11] Ian Horrocksand Ulrike Sattler,`A descriptionlogics with
transitiveand inverserolesandrolehierarchies', Journal of
LogicandComputation,9(3),385{410,(1999).
[12] Manfred Jaeger, `Probabilistic reasoning in terminological
logics',inProceedingsofthe4thInternationalConferenceon
thePrinciplesof Knowledge Representation andReasoning,
(1994).
[13] G.A.Miller,`Wordnet:Anonlinelexicaldatabase',Interna-
tionalJournalofLexicography,3(4),235{312,(1990).
[14] RaymondJ.Mooney,`Inductivelogicprogrammingfornatu-
rallanguageprocessing',inProceedingsoftheSixthInterna-
tionalInductiveLogicProgrammingWorkshop,(1996).
[15] StephenMuggleton,`Inductivelogicprogramming:Issues,re-
sults,andthechallengeoflearninglanguageinlogic',Arti-
cialIntelligence,114(1{2),283{296,(1999).
[16] J.R. Quinlan, `Learning logical denitions from relations',
MachineLearning,5,239{266,(1990).
[17] J.Zelleand R.Mooney,`Learning semanticgrammarswith
constructiveinductivelogicprogramming',inProceedingsof
11thAAAI,pp.817{822,(1993).