A General Framework for Theory Learning. Perspectives for NLP.

(1)

Perpectives for Natural Language Processing

Jordi Alvarez 1

Abstract. Thispositionpaperintroducesageneralframe-

workfor theory learning basedon probabilistic Description

Logics(DL).Theformalbaseoftheframeworkisbrieyde-

scribed.Thepaperalsoarguesthatthegeneraltheorylearn-

ingapproachissuitabletobeusedforNLPtasks;anditcan

bespeciallyusefultoinvestigate interrelationsamongdier-

enttypesofnaturallanguageinformationinordertoproduce

betterNLPsystems.

1 Introduction

Inthispaper,Iadvocateforageneraltheorylearningframe-

work.Therepresentationformalismusedinourframeworkis

anextensionofDescriptionLogics(DL)thatallowstoexpress

probabilistic knowledge.A generallearning procedure based

ontheprevious frameworkhasbeenimplementedaspart of

a system called YAYA [2 ]. This learning procedure is a re-

lationallearning procedurethatperformscomputationsvery

similartothose performed byInductiveLogicProgramming

(ILP)systems.

Thelearningprocedureallowstheusageoflargetaxonomies

as background knowledge. The main goal of the learning

system can thenbe seen as that of ontology acquisition or

completion (ifaninitial ontology isavailableas background

knowledge).

2 Knowledge Representation

A probabilistic extensionof DescriptionLogics(DL)is used

astheunderlyingknowledgerepresentationandreasoningfor-

malism.DLdistinguishesbetweenintensionalandextensional

knowledge. Itprovides aconcept language Lthat isusedto

buildconceptexpressions. Aknowledgebase isdenedas

a pair <T;A >where T is called TBox and consists of a

setofaxiomsthatprovidetheintensionalknowledge; andA

is called ABox and consists of a set of assertions about in-

dividualobjects thatprovidetheextensional knowledge. DL

semanticsisdenedinamodel-theoreticway.Thereadercan

check[7 ]fordetailsonDLformalismandconceptlanguages.

A number of concept languages appear inDL literature.

YAYA concept language (YCL) is an extension of ALCN 2

1

TALP Research Center, Universitat Politecnica deCatalunya,

Modul C6, Campus Nord, 08034 Barcelona, email: jal-

[email protected]

2

ALCN containstop,bottom,conjunction,disjunction,existen-

tial quantication, universal quantication, negation, and un-

qualiednumberrestrictions(for example, numberrestrictions

thatallowsinverseroles,rolecomposition,andaconstruction

equivalenttotheCLASSICSAME-ASconstruct[3 ].

Whythislanguage? Becauseintherstexperimentsdone

forNLPproblems,itseemstobethelessexpressivelanguage

thatisexpressiveenoughtosolvetheproblemonlybydening

asetofaxiomsandusingthereasoningcapabilitiesprovided

byDL 3

.

The axioms allowed by YAYA are of the form C D,

where C and D are concept expressions inL; that is, they

aregeneral conceptinclusionaxioms.So,YAYA TBoxesare

free TBoxes. This, in combination with language complex-

itywouldmakethereasoningproceduresintractable. Butin

problemsinwhichtaxonomicknowledgeisimportant,anac-

curaterepresentationofitincombinationwiththeimplemen-

tationofsomeoptimizationtechniquesforDLreasoningpro-

cedures[10 ]canmaketheoverallreasoningprocesstractable

forpracticalsituations.

2.1 Probabilistic Knowledge

Inordertorepresentprobabilisticknowledge,axiomsareex-

tendedtoamoregeneralform:C;D;whereisaproba-

bilityandisapositiverealnumber.AxiomC

;

Dstates

thattheconditionalprobabilityP(DjC)followsanormaldis-

tributioncentered in and with astandard deviation of

inthesetof modelsof theaxiom. This issomewhatsimilar

tothep-conditioning notionintroducedin[9],butintuitively

more precise by the use of the normal distribution. Unlike

[12 ],YAYAdoesnotallowprobabilisticassertions.

Model theoretic semantics on which DL is usually de-

ned has to be adapted for probabilistic axioms. For non-

probabilisticDLwecandividethesetofinterpretationsofa

knowledgebase=<T;A>intwodierentsets:thesetof

modelsof , and the rest ofinterpretations. Intuitively,for

probabilistic DL we forget about this clear distinction. For

probabilistic DL we cannot make this clear distinction be-

causegivenaninterpretationwecannotdecidewhether itis

amodelof or not.Instead, the setof axiomsinT deter-

minesaprobabilitydistributionoverinterpretations.Every

interpretationIisthenassignedaprobability.Thisproba-

bilitymaybe0ifIhasnochancetobeamodelof.Details

areoutofthescopeofthispositionpaperandcanbefound

at[1].

orroadswithlessthan3lanes).

3

Roleaxiomsalsoseeminteresting, andcaneasilybeintroduced

following[11].

(2)

tenagivenbehaviourcanbestatedwithlessandmore\nat-

ural" axioms if probabilities can take part inaxioms 4

; and

(b)Givenabehaviour tobedenedbya setofaxioms and

twoconceptsCandD,therealwaysexistsomeandsuch

that axiom C

;

D is consistent with the behaviour to

be learned(and and can be estimated from the set of

examplesgiventothelearningprocedure).

These two properties make probabilistic axioms specially

interesting for learning purposes. The rst property states

thatthetheorytobelearnedwillprobablybeeasiertolearn

ifweallowprobabilisticaxioms.Thesecondonemayprovide

apathof \intermediate"axiomsthat are completelyconsis-

tentwiththetheoryto belearnedandmayhelp tondout

real theory axioms (those \denitive" axioms may convert

intermediateaxiomsintounnecessaryones).

3 Theory Learning

Theorylearningisformalizedastheprocessofdetermininga

theory/TBoxT

fromasetofmodelsofT

.So,theexamples

usedinthelearningprocessarecompletemodels 5

.Thisdiers

from usual machine learning approaches in which examples

areindividualentities.

Inorder to dene what theorylearning is, we rstadapt

Shannon information theory notionsto DL domain. An in-

terpretation can be directlyrepresented as aset of boolean

variables:foreveryconceptnameAandeveryelementd2 I

we have a boolean variable (stating whether d 2 A I

or

d 62 A I

); and for every role name P and every pair of ele-

mentsd

1

;d

2

I

wehaveanotherbooleanvariable(stating

whether (d1;d2) 2 P I

). The entropy of I is dened as the

numberof boolean variables necessary to represent I when

noadditionalknowledgeisavailable.Thisis,infact,adirect

interpretationofShannoninformationtheorynotions.

The amount of information provided by a TBox T with

respect to I canbe intuitivelydened as the dierence be-

tweenthenecessarynumberofbooleanvariablestorepresent

Iwhennoadditionalknowledgeisavailable,andthethemin-

imum numberof boolean variables necessaryto represent I

when T is available 6

. Then, the learning procedure has to

maximizetheamountofinformationprovidedbyT withre-

specttoatrainingset.Anaxiomwill beinteresting forT

ifT [fgprovidesmoreinformationthanT.

ComputingtheamountofinformationprovidedbyaTBox

is intractable. Instead, some measures that provide anidea

of theinformation addedby anaxiom

1

given anotherax-

iom2 canbeeÆcientlycomputed[2].Thesemeasureshave

4

Avery simple and illustrative example is the penguin excep-

tion so widely used: the following probabilistic axiomatization

fbird0:95;0:05canfl y;penguinbird;penguin:canfl yg

(supposing probabilities are correct) issimpler than its corre-

spondingnon-probabilisticaxiomatization:fbirdu:penguin

canfl y;penguin bird;penguin : canfl yg. Yes, onlyan

antecedent isabit larger.Butthisisa quitesimpletheory;for

largertheoriesdierenceswillsurebebigger.

5

Forexample,ifweareworkinginNLP,anexamplecouldcorre-

spondtoawholesentence, withalltheinformationwewantto

learnandfromwhichwewanttolearn(syntacticgroups,syntac-

ticrelationsbetweenthem,wordsenses::: ).

6

Thiscapturesthefactthatsomeofthebooleanvariablesneces-

sarywhennoknowledge isavailablecanbededucedfromother

byalearningprocedure.

YAYAlearning procedureisnotinvolvedwithPAClearn-

abilitynorLeastCommonSubsumer(LCS)computation[4],

as mostworkin learning DLsdoes(see [5 ,8]for example).

Instead, our learning procedure is a general axiom explo-

rationprocedurethatis heuristicallyguidedbytheinforma-

tiontheorymeasuresmentionedabove.AninitialtheoryT0is

evolved 7

throughtheadditionofnewaxioms.Thisisdoneby

theapplicationofaset ofinduction rules thatproducenew

axiomsfromaxiomsalreadyexistinginthetheory.

Thereare19dierentinductionrulesthatcanbeclassied

mainlyintogeneralization,specialization,andgeneralexplo-

ration rules 8

. In addition, some of these rules are specially

designedtoworkwithtaxonomicknowledge.

Althoughthere isnospace to explain the whole 19rules,

wecansketchsomeofthemtoseehowtheywork:

expandantecedentrule It adds an existential construc-

tiontothe\end"oftheantecedentofanaxiom.

For example, suppose we are acquiring a theory about

familyrelationships from aa groupof families 9

. Suppose

1

= man

0:5;0:3

9haswife:woman 2 T

t 10

.

Then, this rule would explore axioms by adding sim-

ple existential constructions to the end of antecedent.

Amongothersit will producetheaxiomPhi

2

= manu

9haschil d:person 0:87;0:1 9haswife:woman, where the

parametersfor thedistributionaresupposedtohavebeen

inferredfromthetrainingset.

As

2

provides more information in what refers to man

havingchildsthan1,itwillbeaddedtoTt.

same-asaxiomrule It builds new axioms with SAME-

AS constructions in the consequent. Going on with the

previous example, suppose we have 3 = person u

9hasgrandparent

1;0

9hasparent:9hasparent:person;

andwehavethat32Tt.

The application of the same-as axiom rule to

3

would build an axiom 4 with the same antecedent

and a consequent constituted by a SAME-AS construct

built from the antecedent and the consequent of

3 .

That is, 4 = person u 9hasgrandparent 1;0

[SAME AShasparentÆhasparent;hasgranparent].

4 increases the amount of information ofTt,as 4 con-

sequent is morespecic than

3

one. So, YAYA learning

procedurewouldadd4 toTt.

Theiterative ruleapplication processis repeateduntilall

theaxioms\reachable"throughtheapplicationofaninduc-

tionrulearenotinterestingfor T.YAYAlearningprocedure

has been designed from practical principles. Unfortunately

thereis notheoretical boundonthe amountof information

capturedbytheTBoxresultingfromthelearningprocesswith

respecttoT

.

7

Thisinitialtheorymaybeempty,itmaycontainasubsetofthe

theorytobelearned,anditmayalsotakeintoaccountontological

knowledge.

8

ThersttwotypesaresimilartoILPgeneralizationandspecial-

izationoperators.

9

Note thatherethe examplescorrespondto familiesand notto

individuals,asitwouldcorrespondinmostILPsystems.

10

Throughoutthisexample,itissupposedthatman,woman,and

personareconceptnamesthatappearinthe trainingset,and

(3)

thataninterestingaxiommaybereachedfromalotofdier-

entsearchstates.Thisisimportant becauseinsomesenseit

reducestheamountoflocalminimainthesearchspace.And

astheproceduresketchedaboveisinfactagreedysearch,it

maybeaectedbylocalminima.

Byother hand,the YAYAlearning procedure isnot com-

plete;thatis,itdoesnotguaranteeitwillndagivenaxiom

oragiventheory.Nevertheless,ithasbeentestedsuccessfully

onsometypicalILPbenchmarkssuchasthefamilyrelation-

ships,andthechessking-rook-kingillegalpositionones[16 ].

4 YAYA and Natural Language Processing

The maingoal of YAYA isto be appliedto NLP problems.

Relationallearningsystemshavebeenshowntobewellsuited

forNaturalLanguageLearning(NLL).ILPhasbeensuccess-

fullyappliedinanumberofNLPproblems[15 ,14 ,6].

Withthispurpose,thewidelyusedWordNetlexicalontol-

ogy [13 ] has been integrated into YAYA. WordNet is auto-

maticallyconvertedintoasetofaxiomsthatcanbeused:(1)

To provide a backgroundtheory to the learning procedure;

and(2)Forreasoningpurposes(oncethelearningprocedure

hasdoneitsjob).

Despite the big amount of concepts (synsets in WordNet

terminology)inthementionedontology,acarefulselectionof

theaxiomsintowhichthetaxonomyismapped,andtheim-

plementationofsomeoptimizationtechniquesfrom[10 ]main-

tainthereasoningtaskstractable.

YAYA provides apowerful representation, reasoning, and

learning framework for NLP tasks. This has the immediate

advantagetomakepossibletoboarddierentNLPtasksusing

thesameenvironment.Butmostimportant,itprovidesaway

to experimentwith theintegration of several ofthese tasks;

and tostudytheinteractionamongthem.Additionally,and

dierentto ILP systems,YAYA hasbeendesigned specially

totakebenetfromavailabletaxonomicknowledge.

Traditionally, several phases have been dened in NLP:

morfologicalanalysis,syntacticalanalysis,wordsensedisam-

biguation, contextual resolution, conceptual representation,

etc.Thecompleteanalysisofanaturallanguagetextconsists

of solving eachone of the phases in anordered way.Then,

onephasecanuseinformationproducedbyprevious phases;

but,ofcourse,itcannotuseinformationfromlaterphasesas

ithasnotalreadybeenproduced.

Thegoalofthephasesanalysisistomaintainthecomplex-

ityoftheproblembelowcertainlimits;butitmaymisssome

importantinteractionsbetweendierentphases.ForNLLsys-

temsthismayresultinthesubstitutionofamissedinteraction

byasetofspecialsituationsthat approximateittakinginto

accountonlyinformationinpreviousphases.

ReducingNLP toanonlybigphase isnotrealisticatthis

moment. The number of interactions that should be taken

intoaccountbyalearningprocedureisprobablytoolargeto

obtaininterestingresultsinareasonableamountoftime.In-

stead,arestrictedinteractionmodelcanbeusedtoexplorein-

terestinginteractions.YAYAprovidesthenecessaryelements

toexperimentwithdierentinteractionmodels.Nevertheless,

thisworkisatitsinitialstageatthismoment.

SomeexperimentshavebeenperformedforNLP.Morecon-

roles. The whole setof restrictions implicitly present inthe

trainingsethassuccessfullybeenlearned.

REFERENCES

[1] JordiAlvarez,`Extendingthetableauxcalculusforprobabilis-

ticdescriptionlogics',TechnicalReportLSI-00-R,Universitat

PolitcnicadeCatalunya,(2000).

[2] Jordi Alvarez, `A formalframework for theory learning us-

ingdescriptionlogics',TechnicalReportLSI-00-R,Universi-

tatPolitcnicadeCatalunya,(2000).

[3] AlexBorgidaandPeterF.Patel-Schneider,`Asemanticsand

completealgorithmforsubsumptionintheCLASSICdescrip-

tionlogic',JournalofArticialIntelligenceResearch,1,277{

308,(1994).

[4] W.Cohen,A.Borgida,andH.Hirsh,`Computingleastcom-

monsubsumersindescriptionlogics',inProceedingsofAAAI-

92,pp.754{760,(1992).

[5] W.CohenandH.Hirsh,`Thelearnabilityofdescriptionlogics

withequalityconstraints',MachineLearning,17(2{3),169{

199,(1994).

[6] JamesCussens,DavidPage,StephenMuggleton,andAshwin

Srinivasan, `Using Inductive Logic Programming for Natu-

ral Language Processing', in ECML'97 { Workshop Notes

on Empirical Learning of Natural Language Tasks, eds.,

W.Daelemans,T.Weijters,andA.vanderBosch,pp.25{34,

Prague,(1997).

[7] F.M.Donini,M.Lenzerini,D.Nardi,andA.Schaerf,`Rea-

soning indescription logics',inStudies in Logic, Language

and Information, ed., G. Brewka, 193{238, CLSI Publica-

tions,(1996).

[8] M.FrazierandL.Pitt,`CLASSIClearning',MachineLearn-

ing,25,151{193,(1996).

[9] Jochen Heinsohn, `Probabilistic description logics', in Pro-

ceedingsof the10thConferenceonUncertaintyinArticial

Intelligence,Seattle,Washington,(1994).

[10] I. Horrocks and P. PatelSchneider, `Optimising description

logicsubsumption',JournalofLogicandComputation,9(3),

267{293,(1999).

[11] Ian Horrocksand Ulrike Sattler,`A descriptionlogics with

transitiveand inverserolesandrolehierarchies', Journal of

LogicandComputation,9(3),385{410,(1999).

[12] Manfred Jaeger, `Probabilistic reasoning in terminological

logics',inProceedingsofthe4thInternationalConferenceon

thePrinciplesof Knowledge Representation andReasoning,

(1994).

[13] G.A.Miller,`Wordnet:Anonlinelexicaldatabase',Interna-

tionalJournalofLexicography,3(4),235{312,(1990).

[14] RaymondJ.Mooney,`Inductivelogicprogrammingfornatu-

rallanguageprocessing',inProceedingsoftheSixthInterna-

tionalInductiveLogicProgrammingWorkshop,(1996).

[15] StephenMuggleton,`Inductivelogicprogramming:Issues,re-

sults,andthechallengeoflearninglanguageinlogic',Arti-

cialIntelligence,114(1{2),283{296,(1999).

[16] J.R. Quinlan, `Learning logical denitions from relations',

MachineLearning,5,239{266,(1990).

[17] J.Zelleand R.Mooney,`Learning semanticgrammarswith

constructiveinductivelogicprogramming',inProceedingsof

11thAAAI,pp.817{822,(1993).