HAL Id: inria-00000548
https://hal.inria.fr/inria-00000548
Submitted on 9 Nov 2006
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm
Sylvain Gelly, Nicolas Bredeche, Michèle Sebag
To cite this version:
Sylvain Gelly, Nicolas Bredeche, Michèle Sebag. From Factorial and Hierarchical HMM to Bayesian
Network : A Representation Change Algorithm. Symposium on Abstraction, Reformulation and Ap-
proximation, Jul 2005, Edinburgh, Scotland, UK. �inria-00000548�
Bayesian Network : A Representation Change
Algorithm
SylvainGelly,NiolasBredehe,andMihèleSebag
EquipeInferene&Apprentissage-ProjetTAO(INRIAfuturs),
LaboratoiredeReherheenInformatique,
UniversitéParis-Sud,91405OrsayCedex
FRANCE
(gelly,bredehe,sebag)lri.fr
http://tao.lri.fr
Abstrat. Fatorial Hierarhial Hidden Markov Models (FHHMM)
provides a powerful way to endow an autonomous mobile robot with
eient map-buildingand map-navigationbehaviors. However, the in-
ferenemehanisminFHHMMhasseldombeenstudied.Inthispaper,
we suggest an algorithm that transforms a FHHMM into a Bayesian
Networkinordertobeable to performinferene. Asamatterof fat,
infereneinBayesianNetworkisawell-knownmehanismandthisrep-
resentation formalismprovides awellgroundedtheoretialbakground
that may help us to ahieve our goal. The algorithm we present an
handle two problems arisingin suha representation hange : (1)the
ostduetotakingintoaountmultipledependeniesbetweenvariables
(e.g.omputeP(Y|X1, X2, ..., Xn)),and(2)theremovalofthedireted
ylesthat may be present inthe soure graph. Finally,we show that
ourmodelisabletolearnfasterthanalassialBayesiannetworkbased
representationwhenfew(orunreliable)dataisavailable,whihisakey
featurewhenitomestomobilerobotis.
1 Introdution
Manyworksinmobile robotisrelyonprobabilistimodelssuhasPOMDPor
HMM 1
,et.)tobuildamapofanenvironment[2,1,7,4,5℄.Indeed,theproper-
tiesofthese models arepartiularly relevantin theontext ofrobotis, aswell
as extensionsofthese models.Firstly, theproblemof knowledgegeneralization
anpartlybesolvedifweonsiderahierarhialmodel(enodeagivenplaeat
sereralgranularities)[6℄.Seondly,takingintoaounttheinvariantsanalsobe
ahievedif weonsider amodel that implements afatorization operator (e.g.
agivenplaeloationshouldbepereivedwithnoonsiderationsfortheatual
1
inthefollowing of theartile, wedeal withHMMratherthanwithPOMDP.The
partiularityofthelatterbeingthattheyexpliitlytakeintoaountation,whih
iedseparately,itisquitediulttoendowaHMM-basedmodelwiththesetwo
simultaneously.Asfar asweknow, thereexists noeient inferenealgorithm
that andealwithsuhamodel.
Inthispaper,we presentanapproahto perform inferenewithin aFato-
rialandHierarhialHMM(i.e.FHHMM 2
).Ourapproahreliesonanalgorithm
that performsarepresentationhangefrom FHHMM totheBayesianNetwork
representationformalism.ThehoieoftheBayesianNetworkformalismismoti-
vatedbythestrongtheoritialfundationsandtheeientalgorithmsthatexists
in it.
However,severaldiultiesarisewithsuharepresentationhangebeause
ofthestruturaldierenesbetweenthetwoformalismsandtheirintrinsiprop-
erties. In partiular, we identify two main problems that must be taken into
aountduring thisproess:
ThereexistsmultipledependeniesintheFHHMM.Theseimpliesanexpo-
nentialgrowthofthenumberofparametersto learn,whihisahallenging
problemwhendealingwithasmallsetofexample(thisisanintrinsiprop-
ertyin mobilerobotis);
There exists direted yles in the onditional dependenies between the
variablesofaFHHMM.Itiswellknownthatdiretedylesarenotallowed
withinaBayesiannetwork(weshouldnotehoweverthatthesedependenies
areaproblemonlybetweenvariablesatasametimestep(seesetion2)).
Inthe following setion, wepresent theHMM formalism and the fatorial
andhierarhialextensions.Then,wedesribetheinfereneprobleminthease
ofFHHMM.Setion3and4presentsourapproahalongwiththerepresentation
hangealgorithm.Lastly,setion5presentstwoexperimentswhihonfrontthe
resultingmodelandlassialBayesiannetworksforalearningtask.Weonlude
thispaperwithadisussionabouttheinterestingpropertiesshownbyourmodel
aswellastheompromisewemadesoastobeabletolearnfromfewdata,whih
isoftentheaseofamobilerobotbuildingamapofitsenvironment.
2 Problem Setting
2.1 Hierarhial and Fatorial HMM
Known limitations with HMM, and more generaly with markov models, are
onerned with saling, taking into aount independent phenomena and the
diulty to generalize. However, there exists several extensions to solve this
problem.Inthefollowing,wefousourattentiononhierarhialHMM[7,5℄and
fatorialHMM[3℄
3
.
2
Weusethisabrevationinthefollowingoftheartile.
3
Theseextensionshavebeenusedseparately(withPOMDPs)for map-buildingbya
of links betweenthe statesofan HMM,and thenredue the algorithmiom-
plexity of learning aswell asimproving the aurray. On the other hand, the
fatorialextensionmakesitpossibletoexplainobservationswithseveralauses
rather than only one. In this ase, the goal is to turn the P(Y|X) of HMM
into P(Y|X1, X2, ..., Xn). TheXi are hidden variables and anbedealt with
separatly.Thus, theP(Xt+1i |Xti)aredierentforeahi.
2.2 Conditional dependeniesand sparse data
Let'sbeginbyintroduingthefollowingdenitions:
Astatidependenydenotestheonditional dependenybetweentwovari-
ablesat the sametime step. It isimportant to notie that the problem of
diretedylesariseonlyfrom thiskindofdependenies.
Adynamidependenyisdened asaonditionaldependenyfortwo(e.g.
lassialHMM) or several variables between two time steps (e.g. fatorial
HMM).
ClassiandhierarhialHMMontainonlydynamidependenies.However,
statidependeniesanbefoundintheaseoffatorialHMMwhenonditional
dependeniesareto bereatedbetweensomevariables.
Inthe sope of this paper,weonsider aspeial kind of HMM, where the
dependeniestypemaybeapriori undened.Asamatteroffat,dynamiand
stati dependenies are both expressed asonditional dependenies within the
Bayesiannetwork formalism.
2.3 Problem Issues
SineweonsideranHMMthatimplementsboththefatorialandhierarhial
extensions along with undened dependenies, we fae the problem of nding
a ttedinferene algorithm. As a matter of fat,there do not exists any suh
algorithmsforthiskindofmodel.Thisistherstissue:howtoperforminferene
in suhamodel.
Anotherimportant issueis that due to theoriginal motivation (i.e. mobile
robotis), we have to onsider the asewhere there is few data to learnfrom.
Indeed,thesampleproessissupposedtobeontrolledbytherobot'sbehavior
andthe environment,whih usuallygivesfew andbiaisedexamples.Hene,we
state that agood property of ourmodel would beto favorthelearning speed
evenat theostofa(reasonable)lossinauray.
3 Representation hange : from FHHMM to Bayesian
networks
3.1 Constrained representation hange
Taking into aount multiple dependenies : we suggest to reformulate
Fig.1.Exampleofrepresentationhange(BN=>FBN).
Bayesiannetworkformalismisawellknownandgroundedtheoritialandpra-
tialframework.
However, two problems arise with suh a representation hange : (1) the
ostoftakinginto aountthemultipledependenieswhihexistforavariable
(i.e. omputing P(Y|X1, X2, ..., Xn), resulting in 2n parameters when dealing
withbinaryvariables)and(2)reformulatingadiretedylewithin aBayesian
network.
Our solution rely on simplifying the onstrains due to multiple dependen-
ies.Indeed, multiple dependenies aredeomposed bydealing with them two
by two(i.e. taking separately P(Y|X1), P(Y|X2), ..., P(Y|Xn) (resulting in 2n
parameters for binary variables) as well as introduing onstraintsduring the
transformationproess).
3.2 Taking into aount multipledependeniestwoby two
LetV1,V2,...,Vn, withndisreterandomvariables,ofmodalitym1,...,mn.
Weassumethatpi=P(Vi)areknown(vetorofsizemi),foralli,andsome pi,j=P(Vj|Vi),j∈Ii⊂ {1, ..., n}(pi,j isamatrixofsize (mi, mj)).
Thismodelanberepresentedbyagraphwherenodesarerandomvariables
Vi andedgesai,j thatrepresentsthepi,j.Theonditionalprobabilitiesinduea struturethatisnotonstrained(forinstane,theremayexistdiretedyles).In
ordertosimplifythenotation,weintroduethenotionof FlattenedBayesian
Network(orFBN)todesignatethenetworksthataredesribedinthefollowing
ofthepaper.Figure1showsanexampleofrepresentationhangefromagraph
intoaFlattenedBayesianNetwork.
ReformulatingintoBayesiannetworkformalism:additionalvariables
andaxioms: Foreahpairofdependentvariables(Vi, Vj),weaddanadditional
variablewhih parentsareVi andVj. Thisprovidestwoadvantages: (1)limit-
ingtheomplexityofmultipledependenies(at theostofapproximation),(2)
avoidingdiretedyles(inthenewformalism,alledgestargetadditionalvari-
ables). Onethis reformulationisompleted,infereneis madepossiblethanks
Eah variable Vi from the original graph is mapped into a variable of the
Bayesiannetwork,withthesamemodality,notedVi (asbefore).
Eahedgeai,j ismappedintoanadditionalbooleanvariableintheBayesian
network,notedAi,j.TheAi,j haveexatlytwoparentsintheBayesiannetwork,
namely Vi and Vj (i.e. aV-struture). These variablesare artiially observed in order to indue a dependeny betweenthe variables Vi and Vj (observation valuesareassignedto"true").
One the additional variables are added, onditional probabilities must be
omputed as alast step to thetransformation proess, that is to omputethe
P(Ai,j|Vi, Vj).Let'sintroduethefollowingnotations:
LetKj=∪i{Ai,j};
LetK=∪jKj.LetL⊂K.WenoteL=truetheevent∀A∈L, A=true.
Now, we shall dene an axiomati system to satisfy. The goal is to make
theprobabilitiesP(Ai,j|Vi, Vj)reahaxed point(i.e.stable). Thisxedpoint
is reahed thanks to an EM-inspired iterative algorithm whih is desribed in
the following. Satisfying this axiomati system garantees a oherent network
behaviorwithrespetto thedependeniestaken twoby two(omparedto the
behaviorofalassinetwork).
Therstaxiomnamed"behavioraxiom"determinestheinueneofavari-
ableontoanother. This axiomspeies aproperty dened from K=true, i.e.
∀i, j Ai,j =true.Then, this impliesaoupledequation system.Thebehavior
axiomisdened asfollow:
∀i, j P(Vj|Vi, K=true) =pi,j (1)
Seondly,theinformationontainedinaprobabilitydistributionislinkedto
thedierene betweenthis distribution and theapriori distribution.We then
introdue a seond axiom named "not adding information" whih states that
addingadditionalvariablesdonotbringinformationto thenetwork.Then,this
axiom implies loal onstrains on the P(Ai,j|Vi, Vj), i.e. independently taking intoaounttheAi,j.Thenotaddinginformationaxiomisdenedasfollow :
∀j, P(Vj|K=true) =pj (2)
Let'snowdesribethe iterativeproess that satises theaxioms.Formore
details on the equationsystem induedby the axioms,thereader anrefer to
theappendix attheendofthispaper.
Satisfationmehanismoftheaxiomatisystem: foreahiteration,there
isaninter-dependenyproblemwhenomputingtheprobabilitiesP(Ai,j|Vi, Vj)4.
Indeed,if anelementofthe matrixP(Ai,j|Vi, Vj)is modied, thenthe axioms
maybeinvalidatedforanotherdependeny.Inpratial,wehekthatthesystem
4
(updating the matrix) until it onverges. This is ahieved thanks to an EM-
inspirediterativealgorithm whih is onernedwith theaxiomsand isdened
asfollow:
stepE:∀i, j qi,j=P(Vj|Vi, K\ {Ai,j}=true);
stepM:omputeP(Ai,j|Vi, Vj)wrt.qi,j.
Atthispoint,thisalgorithmisnotsuienttomakeP(Ai,j|Vi, Vj)onverge.
Thus,wehavetolimittheinuenebetweenvariablesthrough"limitedupdate"
onstraints.Inthefollowing,wepresentthemehanismswhihareneessaryto
thealgorithmthatwill bedesribedinthenextsetion.
Convergene parameter : link "strength" For eah ar between two variables,
we introdue a new term, namely "strength", whih determines the inuene
of onevariable upon another. A zero strength means that thevariable hasno
diret inuene(i.e.sameasremovingtheadditionalvariable).Thestrengthis
expressedbyf,funtiondenedonthesetofadditionalvariablesAi,j.f(Ai,j) = (f1(Ai,j), ..., fmi(Ai,j))is avetorof size mi (numberof modality forthevari-
ableVi),andfk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))where Hk(P(Ai,j|Vi, Vj))isthe
entropyof linek(P(Ai,j|Vi, Vj)isamatrix).
Updating riterion used to onverge : limiting the diret inuene of variables
thankstothe strengthterm. Inordertoomputetheinueneofavariableion
another variablej, wehaveto takeinto aount both thediret inuene (i.e.
throughanadditionalvariableAij)andindiretinuene(i.e.throughtheother
variablesofwhihiandj bothdepend).
For someongurations however,inuenes will ompensate eah other so
thattheywillbothtendtoalimitstate(probabilitywilltendto0or1),making
itdiult totakethem intoaountany further.As amatteroffat, weshall
thenfae(1)possiblyinniteonvergenetowards0or1and(2)omputational
problemrelatedtheomputerauray(thelatterbeingthemostimportantin
pratial).
In order to solve this problem, we ompute a maximum threshold for the
strength whih isdened for everypairsofvariables andfor everymodalityof
thesourevariablesuhas:
Letfk0(i, j) =fk(Ai,j)when∀i, j qi,j=pj.
Thisthresholdismeanttobeusedasthelinkstrengthifthereisnoindiret
inuene. Hene, the iterative algorithm we present in the next setion must
satisfyfor eah step: ∀i, j fk(Ai,j)≤fk0(i, j) (referto algorithm 2in thenext
setion).
4 Representation hange algorithm
Inthissetion,wepresenttwoomplementaryalgorithmsthatperformthede-
representation hange is performed with respet to the axiomsfor any pair of
variables(i.e.asingleiterationwhihmayormaynotleadtoonvergene).
4.1 Algorithm1 :do N iterations until onvergene
whileP(Ai,j|Vi, Vj)haven'tonverged(distanefromthetermbeforeismore
thanagiventhreshold)orwhilethenumberofiterationshavenotreaheda
maximumdo
allalgorithm 2
omputethedistane betweennewandoldprobabilities
endwhile
4.2 Algorithm2 :do an iteration forall the variablespairs
1: forallpairsofvariablesVi,Vj suhthatthereexistsadependenyVi−> Vj
do
2: if rstiterationthen
3: Setalltheadditionalvariablesasunobserved.
4: Aettheqi,j=P(Vj).
5: else
6: SetthevariableAi,j unobservedandtheother additionalvariablesob-
servedtotrue
7: Calulatetheqi,j=P(Vj|Vi, K\{Ai,j}=true)usinganinfereneinthe
Bayesiannetwork.Theseonditional probabilitiesrepresentsthediret
inuene(withoutthelinkthroughvariableAi,j)ofVi onVj.
8: endif
9: ApplytheequationsoftherstaxiominordertodeterminetheP(Ai,j|Vi, Vj)
withamultiplyonstantforeahline i
10: for all Thelines k of the matrixP(Ai,j|Vi, Vj), aulate the "strength"
fk=fk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))ofthelink i−> j.do
11: if Firstiterationthen
12: fk0(i, j) =fk(Ai,j)
13: else
14: if fk > fk0then
15: Calulatebydihotomythe0 ≤y ≤1 suh asfk(Ayi,j))fk0,(i.e.all
the oeients of the matrix are powered by y). This is done in
orderto"smooth"theparameterstoinreasetheentropyandthen
dereasethe"strength".
16: endif
17: endif
18: endfor
19: Apply theequationsof theseondaxiomto determinethemultiply on-
stants
20: ComputethematrixP(Ai,j|Vi, Vj)
number of examples used for learning. The Y-axis shows the Kullbak-Leibler dis-
tane between the learned joint distribution and the one that was used to generate
thelearningdata.Thegeneratornetworkisshownonthegure(lower-left).Thebest
performingBayesianandattenedBayesiannetworksfor 50examplesare alsoshown
onthegure(up).
21: endfor
Inthenextsetion,weshowsomeexperimentsthatrelyonthisalgorithms.
5 Experiments
5.1 Experimentalsetup
In order to experimentally validate our approah, we onduted some exper-
iments on the learnability of the networks after a representation hange (i.e.
attenedBayesiannetworks).Ourexperimental setupisdenedasfollow:
ageneratornetworkwhihaneitherbeaattenedBayesiannetwork(exp.
1)oralassiBayesiannetwork(exp.2).Inbothexperiments,thenumber
of nodes in the generator and learnable networks is xed (in the ase of
attened Bayesiannetwork,wedo notountthe additionalnodes builtby
examplesused for learning.TheY-axisshows theKullbak-Leiblerdistanebetween
thelearnedjointdistributionandtheonethatwasusedtogeneratethelearningdata.
Thegeneratornetworkisshownonthegure(lower-left).ThebestperformingBayesian
andattenedBayesiannetworksfor 50examplesarealsoshownonthegure(up).
asetoflearningnetworksthatoversbothall thepossiblelassiBayesian
networksandattenedBayesiannetworksstrutureswiththesamenumber
ofnodesthanthegenerator(i.e.learningisexhaustiveforallstrutureswith
agivensize).
Soastogetagoodapproximationoftheresults,weomputeNdatasequene
fromM randominitializationsforthegeneratornetwork.Asaonsequene,we performN∗M learningsessionsforeahtargetnetwork(20≤N∗M ≤50).
TheerrorisdenedastheKullbak-Leiblerdistanebetweenthejointdistri-
butionof agiventargetnetworkandthedistributionof thegeneratornetwork.
Inthesopeofthispaper,thenetworksizeforallexperimentsislimitedto4so
that it ispossibleto evaluatethe performane for allpossiblestrutures. Asa
matteroffat,thenumberofpossiblestruturesgrowsmorethanexponentially
Bayesian network
Firstly,westudythebehaviorofattenedBayesiannetworksinthemostfavor-
ablesetup,i.e.whenlearningondatageneratedbyaattenedBayesiannetwork.
Inthisexperiment,thegeneratorisa4-nodeyliattenedBayesiannetwork.
Figure 2showsthis generatoraswellastheresultsobtainedwith bothallthe
attenedBayesiannetworksandlassiBayesiannetworkthatontains4nodes.
ThisgureshowsthattheattenedBayesiannetworksalwaysperformbetter
foraverageand bestperformanes.However,learningperformane tendsto be
the same as the number of examples inreases (≥ 250). Flattened Bayesian
networks are thus relevant when learning from suh data. Moreoverit should
be noted that the best performing attened Bayesian network is struturaly
dierentfromthegenerator,meaningthatthemorereliablestruturewhenfew
examplesareavailableisnottheverystrutureofthegenerator.
5.3 Experiment2: Learning from fewexamples
Seondly,wehoosea4-nodelassiBayesiannetworkasdatagenerator(f.g.
3).Asaonsequene,learningwithattenedBayesiannetworksfaestheworst
asesinethegenerator'sjointprobabilityanbeanything.Asamatteroffat,
attened Bayesian network are supposed to be better for some distributions
(unknownatthisstageof ourresearh).
Figure3showstheresultswithrespetto theexperimental setupdesribed
earlier. Theimportantresult isthat the attened Bayesiannetworksshowthe
best results bothin average and for the best when there are few examples to
learnfrom.However,lassiBayesiannetworksbeomebetterasthenumberof
examplesgrow.Theseresultsshowlearlythat attenedBayesiannetwork pay
fortheadvantageoflearningspeedwithalossinaurayinthelongterm(i.e.
ompromisebetweena fast learningurve againtnon-aurate learning in the
longterm).
5.4 Disussion
Aordingtotheresultsobtainedearlier,itappearsthat thebest networksare
also thesimplestones.Thus,it seemsmorerelevantto learnwithasimpleyet
inadequate struture ratherthan with amoreomplexstruture that isloser
tothegenerator:thisanbeseenasanexplanationforthegoodlearningapa-
bilities ofattenedBayesiannetworks. Figure4tendsto onrmthisassertion
byshowingthedistributionoflassiandattenedBayesiannetworksaording
thelearningperformaneforagivennumberofexamples(herearbitrarilyxed
to50)inexperiment2.IndeedthisgureshowsthatattenedBayesiannetwork