From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm

(1)

HAL Id: inria-00000548

https://hal.inria.fr/inria-00000548

Submitted on 9 Nov 2006

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm

Sylvain Gelly, Nicolas Bredeche, Michèle Sebag

To cite this version:

Sylvain Gelly, Nicolas Bredeche, Michèle Sebag. From Factorial and Hierarchical HMM to Bayesian

Network : A Representation Change Algorithm. Symposium on Abstraction, Reformulation and Ap-

proximation, Jul 2005, Edinburgh, Scotland, UK. �inria-00000548�

(2)

Bayesian Network : A Representation Change

Algorithm

SylvainGelly,NiolasBredehe,andMihèleSebag

EquipeInferene&Apprentissage-ProjetTAO(INRIAfuturs),

LaboratoiredeReherheenInformatique,

UniversitéParis-Sud,91405OrsayCedex

FRANCE

(gelly,bredehe,sebag)lri.fr

http://tao.lri.fr

Abstrat. Fatorial Hierarhial Hidden Markov Models (FHHMM)

provides a powerful way to endow an autonomous mobile robot with

eient map-buildingand map-navigationbehaviors. However, the in-

ferenemehanisminFHHMMhasseldombeenstudied.Inthispaper,

we suggest an algorithm that transforms a FHHMM into a Bayesian

Networkinordertobeable to performinferene. Asamatterof fat,

infereneinBayesianNetworkisawell-knownmehanismandthisrep-

resentation formalismprovides awellgroundedtheoretialbakground

that may help us to ahieve our goal. The algorithm we present an

handle two problems arisingin suha representation hange : (1)the

ostduetotakingintoaountmultipledependeniesbetweenvariables

(e.g.omputeP(Y|X₁, X₂, ..., Xn)^),^and⁽²⁾^the^removal^of^the^direted

ylesthat may be present inthe soure graph. Finally,we show that

ourmodelisabletolearnfasterthanalassialBayesiannetworkbased

representationwhenfew(orunreliable)dataisavailable,whihisakey

featurewhenitomestomobilerobotis.

1 Introdution

Manyworksinmobile robotisrelyonprobabilistimodelssuhasPOMDPor

HMM 1

,et.)tobuildamapofanenvironment[2,1,7,4,5℄.Indeed,theproper-

tiesofthese models arepartiularly relevantin theontext ofrobotis, aswell

as extensionsofthese models.Firstly, theproblemof knowledgegeneralization

anpartlybesolvedifweonsiderahierarhialmodel(enodeagivenplaeat

sereralgranularities)[6℄.Seondly,takingintoaounttheinvariantsanalsobe

ahievedif weonsider amodel that implements afatorization operator (e.g.

agivenplaeloationshouldbepereivedwithnoonsiderationsfortheatual

1

inthefollowing of theartile, wedeal withHMMratherthanwithPOMDP.The

partiularityofthelatterbeingthattheyexpliitlytakeintoaountation,whih

(3)

iedseparately,itisquitediulttoendowaHMM-basedmodelwiththesetwo

simultaneously.Asfar asweknow, thereexists noeient inferenealgorithm

that andealwithsuhamodel.

Inthispaper,we presentanapproahto perform inferenewithin aFato-

rialandHierarhialHMM(i.e.FHHMM 2

).Ourapproahreliesonanalgorithm

that performsarepresentationhangefrom FHHMM totheBayesianNetwork

representationformalism.ThehoieoftheBayesianNetworkformalismismoti-

vatedbythestrongtheoritialfundationsandtheeientalgorithmsthatexists

in it.

However,severaldiultiesarisewithsuharepresentationhangebeause

ofthestruturaldierenesbetweenthetwoformalismsandtheirintrinsiprop-

erties. In partiular, we identify two main problems that must be taken into

aountduring thisproess:

ThereexistsmultipledependeniesintheFHHMM.Theseimpliesanexpo-

nentialgrowthofthenumberofparametersto learn,whihisahallenging

problemwhendealingwithasmallsetofexample(thisisanintrinsiprop-

ertyin mobilerobotis);

There exists direted yles in the onditional dependenies between the

variablesofaFHHMM.Itiswellknownthatdiretedylesarenotallowed

withinaBayesiannetwork(weshouldnotehoweverthatthesedependenies

areaproblemonlybetweenvariablesatasametimestep(seesetion2)).

Inthe following setion, wepresent theHMM formalism and the fatorial

andhierarhialextensions.Then,wedesribetheinfereneprobleminthease

ofFHHMM.Setion3and4presentsourapproahalongwiththerepresentation

hangealgorithm.Lastly,setion5presentstwoexperimentswhihonfrontthe

resultingmodelandlassialBayesiannetworksforalearningtask.Weonlude

thispaperwithadisussionabouttheinterestingpropertiesshownbyourmodel

aswellastheompromisewemadesoastobeabletolearnfromfewdata,whih

isoftentheaseofamobilerobotbuildingamapofitsenvironment.

2 Problem Setting

2.1 Hierarhial and Fatorial HMM

Known limitations with HMM, and more generaly with markov models, are

onerned with saling, taking into aount independent phenomena and the

diulty to generalize. However, there exists several extensions to solve this

problem.Inthefollowing,wefousourattentiononhierarhialHMM[7,5℄and

fatorialHMM[3℄

3

.

2

Weusethisabrevationinthefollowingoftheartile.

3

Theseextensionshavebeenusedseparately(withPOMDPs)for map-buildingbya

(4)

of links betweenthe statesofan HMM,and thenredue the algorithmiom-

plexity of learning aswell asimproving the aurray. On the other hand, the

fatorialextensionmakesitpossibletoexplainobservationswithseveralauses

rather than only one. In this ase, the goal is to turn the P(Y|X) ^of ^HMM

into P(Y|X¹, X², ..., Xⁿ)^. ^TheXⁱ âre ^hidden ^variables ând ân^be^dealt ^with

separatly.Thus, theP(X_t+1ⁱ |X_tⁱ)^are^dierent^for^eahi^.

2.2 Conditional dependeniesand sparse data

Let'sbeginbyintroduingthefollowingdenitions:

Astatidependenydenotestheonditional dependenybetweentwovari-

ablesat the sametime step. It isimportant to notie that the problem of

diretedylesariseonlyfrom thiskindofdependenies.

Adynamidependenyisdened asaonditionaldependenyfortwo(e.g.

lassialHMM) or several variables between two time steps (e.g. fatorial

HMM).

ClassiandhierarhialHMMontainonlydynamidependenies.However,

statidependeniesanbefoundintheaseoffatorialHMMwhenonditional

dependeniesareto bereatedbetweensomevariables.

Inthe sope of this paper,weonsider aspeial kind of HMM, where the

dependeniestypemaybeapriori undened.Asamatteroffat,dynamiand

stati dependenies are both expressed asonditional dependenies within the

Bayesiannetwork formalism.

2.3 Problem Issues

SineweonsideranHMMthatimplementsboththefatorialandhierarhial

extensions along with undened dependenies, we fae the problem of nding

a ttedinferene algorithm. As a matter of fat,there do not exists any suh

algorithmsforthiskindofmodel.Thisistherstissue:howtoperforminferene

in suhamodel.

Anotherimportant issueis that due to theoriginal motivation (i.e. mobile

robotis), we have to onsider the asewhere there is few data to learnfrom.

Indeed,thesampleproessissupposedtobeontrolledbytherobot'sbehavior

andthe environment,whih usuallygivesfew andbiaisedexamples.Hene,we

state that agood property of ourmodel would beto favorthelearning speed

evenat theostofa(reasonable)lossinauray.

3 Representation hange : from FHHMM to Bayesian

networks

3.1 Constrained representation hange

Taking into aount multiple dependenies : we suggest to reformulate

(5)

Fig.1.Exampleofrepresentationhange(BN=>^FBN).

Bayesiannetworkformalismisawellknownandgroundedtheoritialandpra-

tialframework.

However, two problems arise with suh a representation hange : (1) the

ostoftakinginto aountthemultipledependenieswhihexistforavariable

(i.e. omputing P(Y|X1, X2, ..., Xn)^, ^resulting ⁱⁿ 2ⁿ ^parameters ^when ^dealing

withbinaryvariables)and(2)reformulatingadiretedylewithin aBayesian

network.

Our solution rely on simplifying the onstrains due to multiple dependen-

ies.Indeed, multiple dependenies aredeomposed bydealing with them two

by two(i.e. taking separately P(Y|X1), P(Y|X2), ..., P(Y|Xn) ^(resulting ⁱⁿ 2n

parameters for binary variables) as well as introduing onstraintsduring the

transformationproess).

3.2 Taking into aount multipledependeniestwoby two

LetV1^,V2^,^...,Vn^, ^withn^disrete^random^variables,^of^modalitym1^,^...,mn^.

Weassumethatpi=P(Vi)âre^known^(vetorôf^sizemi^),^forâlli^,ând^some pi,j=P(Vj|Vi)^,j∈Ii⊂ {1, ..., n}⁽pi,j îsâ^matrixôf^size (mi, mj)^).

Thismodelanberepresentedbyagraphwherenodesarerandomvariables

Vi ândêdgesai,j ^that^represents^thepi,j^.^Theônditionalprobabilitiesinduea struturethatisnotonstrained(forinstane,theremayexistdiretedyles).In

ordertosimplifythenotation,weintroduethenotionof FlattenedBayesian

Network(orFBN)todesignatethenetworksthataredesribedinthefollowing

ofthepaper.Figure1showsanexampleofrepresentationhangefromagraph

intoaFlattenedBayesianNetwork.

ReformulatingintoBayesiannetworkformalism:additionalvariables

andaxioms: Foreahpairofdependentvariables(Vi, Vj)^,^weâddânâdditional

variablewhih parentsareVi ^andVj^. ^This^provides^two^advantages^: ⁽¹⁾^limit-

ingtheomplexityofmultipledependenies(at theostofapproximation),(2)

avoidingdiretedyles(inthenewformalism,alledgestargetadditionalvari-

ables). Onethis reformulationisompleted,infereneis madepossiblethanks

(6)

Eah variable Vi ^from ^the ôriginal ^graph îs ^mapped înto â ^variable ôf ^the

Bayesiannetwork,withthesamemodality,notedVi ^(as^before).

Eahedgeai,j îs^mappedîntoânâdditional^boolean^variableⁱⁿ^the^Bayesian

network,notedAi,j^.^TheAi,j ^have^exatly^two^parentsⁱⁿ^the^Bayesian^network,

namely Vi ând Vj ^(i.e. âV-struture). These variablesare artiially observed in order to indue a dependeny betweenthe variables Vi ând Vj (observation valuesareassignedto"true").

One the additional variables are added, onditional probabilities must be

omputed as alast step to thetransformation proess, that is to omputethe

P(Ai,j|Vi, Vj)^.^Let's^introdue^the^following^notations^:

LetKj=∪i{Ai,j}^;

LetK=∪jKj^.^LetL⊂K^.^We^noteL=true^the^event∀A∈L, A=true^.

Now, we shall dene an axiomati system to satisfy. The goal is to make

theprobabilitiesP(Ai,j|Vi, Vj)^reah^a^xed ^point^(i.e.^stable). ^This^xed^point

is reahed thanks to an EM-inspired iterative algorithm whih is desribed in

the following. Satisfying this axiomati system garantees a oherent network

behaviorwithrespetto thedependeniestaken twoby two(omparedto the

behaviorofalassinetwork).

Therstaxiomnamed"behavioraxiom"determinestheinueneofavari-

ableontoanother. This axiomspeies aproperty dened from K=true^, ^i.e.

∀i, j Ai,j =true^.^Then, ^this împliesâôupledêquation ^system.^The^behavior

axiomisdened asfollow:

∀i, j P(Vj|Vi, K=true) =pi,j ⁽¹⁾

Seondly,theinformationontainedinaprobabilitydistributionislinkedto

thedierene betweenthis distribution and theapriori distribution.We then

introdue a seond axiom named "not adding information" whih states that

addingadditionalvariablesdonotbringinformationto thenetwork.Then,this

axiom implies loal onstrains on the P(Ai,j|Vi, Vj)^, ^i.e. independently taking intoaounttheAi,j^.^The^not^addinginformationaxiomisdenedasfollow :

∀j, P(Vj|K=true) =pj ⁽²⁾

Let'snowdesribethe iterativeproess that satises theaxioms.Formore

details on the equationsystem induedby the axioms,thereader anrefer to

theappendix attheendofthispaper.

Satisfationmehanismoftheaxiomatisystem: foreahiteration,there

isaninter-dependenyproblemwhenomputingtheprobabilitiesP(Ai,j|Vi, Vj)⁴^.

Indeed,if anelementofthe matrixP(Ai,j|Vi, Vj)^is ^modied, ^then^the ^axioms

maybeinvalidatedforanotherdependeny.Inpratial,wehekthatthesystem

4

(7)

(updating the matrix) until it onverges. This is ahieved thanks to an EM-

inspirediterativealgorithm whih is onernedwith theaxiomsand isdened

asfollow:

stepE:∀i, j qi,j=P(Vj|Vi, K\ {Ai,j}=true)^;

stepM:omputeP(Ai,j|Vi, Vj)^wrt.qi,j^.

Atthispoint,thisalgorithmisnotsuienttomakeP(Ai,j|Vi, Vj)^onverge.

Thus,wehavetolimittheinuenebetweenvariablesthrough"limitedupdate"

onstraints.Inthefollowing,wepresentthemehanismswhihareneessaryto

thealgorithmthatwill bedesribedinthenextsetion.

Convergene parameter : link "strength" For eah ar between two variables,

we introdue a new term, namely "strength", whih determines the inuene

of onevariable upon another. A zero strength means that thevariable hasno

diret inuene(i.e.sameasremovingtheadditionalvariable).Thestrengthis

expressedbyf^,^funtion^denedôn^the^setôfâdditional^variablesAi,j^.f(Ai,j) = (f1(Ai,j), ..., fmi(Ai,j))îs â^vetorôf ^size mi ^(numberôf ^modality ^for^the^vari-

ableVi^),^andfk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))^where Hk(P(Ai,j|Vi, Vj))^is^the

entropyof linek⁽P(Ai,j|Vi, Vj)^is^a^matrix).

Updating riterion used to onverge : limiting the diret inuene of variables

thankstothe strengthterm. Inordertoomputetheinueneofavariablei^on

another variablej^, ^we^have^to ^takeînto âount ^both ^the^diret înuene ^(i.e.

throughanadditionalvariableAij⁾ândîndiretînuene^(i.e.^through^theôther

variablesofwhihi^andj ^both^depend).

For someongurations however,inuenes will ompensate eah other so

thattheywillbothtendtoalimitstate(probabilitywilltendto0or1),making

itdiult totakethem intoaountany further.As amatteroffat, weshall

thenfae(1)possiblyinniteonvergenetowards0or1and(2)omputational

problemrelatedtheomputerauray(thelatterbeingthemostimportantin

pratial).

In order to solve this problem, we ompute a maximum threshold for the

strength whih isdened for everypairsofvariables andfor everymodalityof

thesourevariablesuhas:

Letf_k⁰(i, j) =fk(Ai,j)^when∀i, j qi,j=pj^.

Thisthresholdismeanttobeusedasthelinkstrengthifthereisnoindiret

inuene. Hene, the iterative algorithm we present in the next setion must

satisfyfor eah step: ∀i, j fk(Ai,j)≤f_k⁰(i, j) ^(refer^to ^algorithm ²ⁱⁿ ^the^next

setion).

4 Representation hange algorithm

Inthissetion,wepresenttwoomplementaryalgorithmsthatperformthede-

(8)

representation hange is performed with respet to the axiomsfor any pair of

variables(i.e.asingleiterationwhihmayormaynotleadtoonvergene).

4.1 Algorithm1 :do N îterations ûntil ônvergene

whileP(Ai,j|Vi, Vj)^haven't^onverged^(distane^from^the^term^before^is^more

thanagiventhreshold)orwhilethenumberofiterationshavenotreaheda

maximumdo

allalgorithm 2

omputethedistane betweennewandoldprobabilities

endwhile

4.2 Algorithm2 :do an iteration forall the variablespairs

1: forallpairsofvariablesVi^,Vj ^suh^that^there^exists^a^dependenyVi−> Vj

do

2: if rstiterationthen

3: Setalltheadditionalvariablesasunobserved.

4: Aettheqi,j=P(Vj)^.

5: else

6: SetthevariableAi,j ûnobservedând^theôther âdditional^variablesôb-

servedtotrue

7: Calulatetheqi,j=P(Vj|Vi, K\{Ai,j}=true)ûsingânînfereneⁱⁿ^the

Bayesiannetwork.Theseonditional probabilitiesrepresentsthediret

inuene(withoutthelinkthroughvariableAi,j⁾^ofVi ^onVj^.

8: endif

9: ApplytheequationsoftherstaxiominordertodeterminetheP(Ai,j|Vi, Vj)

withamultiplyonstantforeahline i

10: for all Thelines k ^of ^the ^matrixP(Ai,j|Vi, Vj)^, ^aulate ^the "strength"

fk=fk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))^of^the^link i−> j^.^do

11: if Firstiterationthen

12: f_k⁰(i, j) =fk(Ai,j)

13: else

14: if fk > f_k⁰^then

15: Calulatebydihotomythe0 ≤y ≤1 ^suh ^asfk(A^y_i,j))f_k⁰^,(i.e.^all

the oeients of the matrix are powered by y^). ^This ^is ^done ⁱⁿ

orderto"smooth"theparameterstoinreasetheentropyandthen

dereasethe"strength".

16: endif

17: endif

18: endfor

19: Apply theequationsof theseondaxiomto determinethemultiply on-

stants

20: ComputethematrixP(Ai,j|Vi, Vj)

(9)

number of examples used for learning. The Y-axis shows the Kullbak-Leibler dis-

tane between the learned joint distribution and the one that was used to generate

thelearningdata.Thegeneratornetworkisshownonthegure(lower-left).Thebest

performingBayesianandattenedBayesiannetworksfor 50examplesare alsoshown

onthegure(up).

21: endfor

Inthenextsetion,weshowsomeexperimentsthatrelyonthisalgorithms.

5 Experiments

5.1 Experimentalsetup

In order to experimentally validate our approah, we onduted some exper-

iments on the learnability of the networks after a representation hange (i.e.

attenedBayesiannetworks).Ourexperimental setupisdenedasfollow:

ageneratornetworkwhihaneitherbeaattenedBayesiannetwork(exp.

1)oralassiBayesiannetwork(exp.2).Inbothexperiments,thenumber

of nodes in the generator and learnable networks is xed (in the ase of

attened Bayesiannetwork,wedo notountthe additionalnodes builtby

(10)

examplesused for learning.TheY-axisshows theKullbak-Leiblerdistanebetween

thelearnedjointdistributionandtheonethatwasusedtogeneratethelearningdata.

Thegeneratornetworkisshownonthegure(lower-left).ThebestperformingBayesian

andattenedBayesiannetworksfor 50examplesarealsoshownonthegure(up).

asetoflearningnetworksthatoversbothall thepossiblelassiBayesian

networksandattenedBayesiannetworksstrutureswiththesamenumber

ofnodesthanthegenerator(i.e.learningisexhaustiveforallstrutureswith

agivensize).

Soastogetagoodapproximationoftheresults,weomputeN^data^sequene

fromM ^randominitializationsforthegeneratornetwork.Asaonsequene,we performN∗M ^learning^sessions^for^eah^target^network⁽20≤N∗M ≤50^).

TheerrorisdenedastheKullbak-Leiblerdistanebetweenthejointdistri-

butionof agiventargetnetworkandthedistributionof thegeneratornetwork.

Inthesopeofthispaper,thenetworksizeforallexperimentsislimitedto4so

that it ispossibleto evaluatethe performane for allpossiblestrutures. Asa

matteroffat,thenumberofpossiblestruturesgrowsmorethanexponentially

(11)

Bayesian network

Firstly,westudythebehaviorofattenedBayesiannetworksinthemostfavor-

ablesetup,i.e.whenlearningondatageneratedbyaattenedBayesiannetwork.

Inthisexperiment,thegeneratorisa4-nodeyliattenedBayesiannetwork.

Figure 2showsthis generatoraswellastheresultsobtainedwith bothallthe

attenedBayesiannetworksandlassiBayesiannetworkthatontains4nodes.

ThisgureshowsthattheattenedBayesiannetworksalwaysperformbetter

foraverageand bestperformanes.However,learningperformane tendsto be

the same as the number of examples inreases (≥ 250^). ^Flattened ^Bayesian

networks are thus relevant when learning from suh data. Moreoverit should

be noted that the best performing attened Bayesian network is struturaly

dierentfromthegenerator,meaningthatthemorereliablestruturewhenfew

examplesareavailableisnottheverystrutureofthegenerator.

5.3 Experiment2: Learning from fewexamples

Seondly,wehoosea4-nodelassiBayesiannetworkasdatagenerator(f.g.

3).Asaonsequene,learningwithattenedBayesiannetworksfaestheworst

asesinethegenerator'sjointprobabilityanbeanything.Asamatteroffat,

attened Bayesian network are supposed to be better for some distributions

(unknownatthisstageof ourresearh).

Figure3showstheresultswithrespetto theexperimental setupdesribed

earlier. Theimportantresult isthat the attened Bayesiannetworksshowthe

best results bothin average and for the best when there are few examples to

learnfrom.However,lassiBayesiannetworksbeomebetterasthenumberof

examplesgrow.Theseresultsshowlearlythat attenedBayesiannetwork pay

fortheadvantageoflearningspeedwithalossinaurayinthelongterm(i.e.

ompromisebetweena fast learningurve againtnon-aurate learning in the

longterm).

5.4 Disussion

Aordingtotheresultsobtainedearlier,itappearsthat thebest networksare

also thesimplestones.Thus,it seemsmorerelevantto learnwithasimpleyet

inadequate struture ratherthan with amoreomplexstruture that isloser

tothegenerator:thisanbeseenasanexplanationforthegoodlearningapa-

bilities ofattenedBayesiannetworks. Figure4tendsto onrmthisassertion

byshowingthedistributionoflassiandattenedBayesiannetworksaording

thelearningperformaneforagivennumberofexamples(herearbitrarilyxed

to50)inexperiment2.IndeedthisgureshowsthatattenedBayesiannetwork