• Aucun résultat trouvé

From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm

N/A
N/A
Protected

Academic year: 2021

Partager "From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm"

Copied!
15
0
0

Texte intégral

(1)

HAL Id: inria-00000548

https://hal.inria.fr/inria-00000548

Submitted on 9 Nov 2006

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

From Factorial and Hierarchical HMM to Bayesian Network : A Representation Change Algorithm

Sylvain Gelly, Nicolas Bredeche, Michèle Sebag

To cite this version:

Sylvain Gelly, Nicolas Bredeche, Michèle Sebag. From Factorial and Hierarchical HMM to Bayesian

Network : A Representation Change Algorithm. Symposium on Abstraction, Reformulation and Ap-

proximation, Jul 2005, Edinburgh, Scotland, UK. �inria-00000548�

(2)

Bayesian Network : A Representation Change

Algorithm

SylvainGelly,NiolasBredehe,andMihèleSebag

EquipeInferene&Apprentissage-ProjetTAO(INRIAfuturs),

LaboratoiredeReherheenInformatique,

UniversitéParis-Sud,91405OrsayCedex

FRANCE

(gelly,bredehe,sebag)lri.fr

http://tao.lri.fr

Abstrat. Fatorial Hierarhial Hidden Markov Models (FHHMM)

provides a powerful way to endow an autonomous mobile robot with

eient map-buildingand map-navigationbehaviors. However, the in-

ferenemehanisminFHHMMhasseldombeenstudied.Inthispaper,

we suggest an algorithm that transforms a FHHMM into a Bayesian

Networkinordertobeable to performinferene. Asamatterof fat,

infereneinBayesianNetworkisawell-knownmehanismandthisrep-

resentation formalismprovides awellgroundedtheoretialbakground

that may help us to ahieve our goal. The algorithm we present an

handle two problems arisingin suha representation hange : (1)the

ostduetotakingintoaountmultipledependeniesbetweenvariables

(e.g.omputeP(Y|X1, X2, ..., Xn)),and(2)theremovalofthedireted

ylesthat may be present inthe soure graph. Finally,we show that

ourmodelisabletolearnfasterthanalassialBayesiannetworkbased

representationwhenfew(orunreliable)dataisavailable,whihisakey

featurewhenitomestomobilerobotis.

1 Introdution

Manyworksinmobile robotisrelyonprobabilistimodelssuhasPOMDPor

HMM 1

,et.)tobuildamapofanenvironment[2,1,7,4,5℄.Indeed,theproper-

tiesofthese models arepartiularly relevantin theontext ofrobotis, aswell

as extensionsofthese models.Firstly, theproblemof knowledgegeneralization

anpartlybesolvedifweonsiderahierarhialmodel(enodeagivenplaeat

sereralgranularities)[6℄.Seondly,takingintoaounttheinvariantsanalsobe

ahievedif weonsider amodel that implements afatorization operator (e.g.

agivenplaeloationshouldbepereivedwithnoonsiderationsfortheatual

1

inthefollowing of theartile, wedeal withHMMratherthanwithPOMDP.The

partiularityofthelatterbeingthattheyexpliitlytakeintoaountation,whih

(3)

iedseparately,itisquitediulttoendowaHMM-basedmodelwiththesetwo

simultaneously.Asfar asweknow, thereexists noeient inferenealgorithm

that andealwithsuhamodel.

Inthispaper,we presentanapproahto perform inferenewithin aFato-

rialandHierarhialHMM(i.e.FHHMM 2

).Ourapproahreliesonanalgorithm

that performsarepresentationhangefrom FHHMM totheBayesianNetwork

representationformalism.ThehoieoftheBayesianNetworkformalismismoti-

vatedbythestrongtheoritialfundationsandtheeientalgorithmsthatexists

in it.

However,severaldiultiesarisewithsuharepresentationhangebeause

ofthestruturaldierenesbetweenthetwoformalismsandtheirintrinsiprop-

erties. In partiular, we identify two main problems that must be taken into

aountduring thisproess:

ThereexistsmultipledependeniesintheFHHMM.Theseimpliesanexpo-

nentialgrowthofthenumberofparametersto learn,whihisahallenging

problemwhendealingwithasmallsetofexample(thisisanintrinsiprop-

ertyin mobilerobotis);

There exists direted yles in the onditional dependenies between the

variablesofaFHHMM.Itiswellknownthatdiretedylesarenotallowed

withinaBayesiannetwork(weshouldnotehoweverthatthesedependenies

areaproblemonlybetweenvariablesatasametimestep(seesetion2)).

Inthe following setion, wepresent theHMM formalism and the fatorial

andhierarhialextensions.Then,wedesribetheinfereneprobleminthease

ofFHHMM.Setion3and4presentsourapproahalongwiththerepresentation

hangealgorithm.Lastly,setion5presentstwoexperimentswhihonfrontthe

resultingmodelandlassialBayesiannetworksforalearningtask.Weonlude

thispaperwithadisussionabouttheinterestingpropertiesshownbyourmodel

aswellastheompromisewemadesoastobeabletolearnfromfewdata,whih

isoftentheaseofamobilerobotbuildingamapofitsenvironment.

2 Problem Setting

2.1 Hierarhial and Fatorial HMM

Known limitations with HMM, and more generaly with markov models, are

onerned with saling, taking into aount independent phenomena and the

diulty to generalize. However, there exists several extensions to solve this

problem.Inthefollowing,wefousourattentiononhierarhialHMM[7,5℄and

fatorialHMM[3℄

3

.

2

Weusethisabrevationinthefollowingoftheartile.

3

Theseextensionshavebeenusedseparately(withPOMDPs)for map-buildingbya

(4)

of links betweenthe statesofan HMM,and thenredue the algorithmiom-

plexity of learning aswell asimproving the aurray. On the other hand, the

fatorialextensionmakesitpossibletoexplainobservationswithseveralauses

rather than only one. In this ase, the goal is to turn the P(Y|X) of HMM

into P(Y|X1, X2, ..., Xn). TheXi are hidden variables and anbedealt with

separatly.Thus, theP(Xt+1i |Xti)aredierentforeahi.

2.2 Conditional dependeniesand sparse data

Let'sbeginbyintroduingthefollowingdenitions:

Astatidependenydenotestheonditional dependenybetweentwovari-

ablesat the sametime step. It isimportant to notie that the problem of

diretedylesariseonlyfrom thiskindofdependenies.

Adynamidependenyisdened asaonditionaldependenyfortwo(e.g.

lassialHMM) or several variables between two time steps (e.g. fatorial

HMM).

ClassiandhierarhialHMMontainonlydynamidependenies.However,

statidependeniesanbefoundintheaseoffatorialHMMwhenonditional

dependeniesareto bereatedbetweensomevariables.

Inthe sope of this paper,weonsider aspeial kind of HMM, where the

dependeniestypemaybeapriori undened.Asamatteroffat,dynamiand

stati dependenies are both expressed asonditional dependenies within the

Bayesiannetwork formalism.

2.3 Problem Issues

SineweonsideranHMMthatimplementsboththefatorialandhierarhial

extensions along with undened dependenies, we fae the problem of nding

a ttedinferene algorithm. As a matter of fat,there do not exists any suh

algorithmsforthiskindofmodel.Thisistherstissue:howtoperforminferene

in suhamodel.

Anotherimportant issueis that due to theoriginal motivation (i.e. mobile

robotis), we have to onsider the asewhere there is few data to learnfrom.

Indeed,thesampleproessissupposedtobeontrolledbytherobot'sbehavior

andthe environment,whih usuallygivesfew andbiaisedexamples.Hene,we

state that agood property of ourmodel would beto favorthelearning speed

evenat theostofa(reasonable)lossinauray.

3 Representation hange : from FHHMM to Bayesian

networks

3.1 Constrained representation hange

Taking into aount multiple dependenies : we suggest to reformulate

(5)

Fig.1.Exampleofrepresentationhange(BN=>FBN).

Bayesiannetworkformalismisawellknownandgroundedtheoritialandpra-

tialframework.

However, two problems arise with suh a representation hange : (1) the

ostoftakinginto aountthemultipledependenieswhihexistforavariable

(i.e. omputing P(Y|X1, X2, ..., Xn), resulting in 2n parameters when dealing

withbinaryvariables)and(2)reformulatingadiretedylewithin aBayesian

network.

Our solution rely on simplifying the onstrains due to multiple dependen-

ies.Indeed, multiple dependenies aredeomposed bydealing with them two

by two(i.e. taking separately P(Y|X1), P(Y|X2), ..., P(Y|Xn) (resulting in 2n

parameters for binary variables) as well as introduing onstraintsduring the

transformationproess).

3.2 Taking into aount multipledependeniestwoby two

LetV1,V2,...,Vn, withndisreterandomvariables,ofmodalitym1,...,mn.

Weassumethatpi=P(Vi)areknown(vetorofsizemi),foralli,andsome pi,j=P(Vj|Vi),j∈Ii⊂ {1, ..., n}(pi,j isamatrixofsize (mi, mj)).

Thismodelanberepresentedbyagraphwherenodesarerandomvariables

Vi andedgesai,j thatrepresentsthepi,j.Theonditionalprobabilitiesinduea struturethatisnotonstrained(forinstane,theremayexistdiretedyles).In

ordertosimplifythenotation,weintroduethenotionof FlattenedBayesian

Network(orFBN)todesignatethenetworksthataredesribedinthefollowing

ofthepaper.Figure1showsanexampleofrepresentationhangefromagraph

intoaFlattenedBayesianNetwork.

ReformulatingintoBayesiannetworkformalism:additionalvariables

andaxioms: Foreahpairofdependentvariables(Vi, Vj),weaddanadditional

variablewhih parentsareVi andVj. Thisprovidestwoadvantages: (1)limit-

ingtheomplexityofmultipledependenies(at theostofapproximation),(2)

avoidingdiretedyles(inthenewformalism,alledgestargetadditionalvari-

ables). Onethis reformulationisompleted,infereneis madepossiblethanks

(6)

Eah variable Vi from the original graph is mapped into a variable of the

Bayesiannetwork,withthesamemodality,notedVi (asbefore).

Eahedgeai,j ismappedintoanadditionalbooleanvariableintheBayesian

network,notedAi,j.TheAi,j haveexatlytwoparentsintheBayesiannetwork,

namely Vi and Vj (i.e. aV-struture). These variablesare artiially observed in order to indue a dependeny betweenthe variables Vi and Vj (observation valuesareassignedto"true").

One the additional variables are added, onditional probabilities must be

omputed as alast step to thetransformation proess, that is to omputethe

P(Ai,j|Vi, Vj).Let'sintroduethefollowingnotations:

LetKj=∪i{Ai,j};

LetK=∪jKj.LetL⊂K.WenoteL=truetheevent∀A∈L, A=true.

Now, we shall dene an axiomati system to satisfy. The goal is to make

theprobabilitiesP(Ai,j|Vi, Vj)reahaxed point(i.e.stable). Thisxedpoint

is reahed thanks to an EM-inspired iterative algorithm whih is desribed in

the following. Satisfying this axiomati system garantees a oherent network

behaviorwithrespetto thedependeniestaken twoby two(omparedto the

behaviorofalassinetwork).

Therstaxiomnamed"behavioraxiom"determinestheinueneofavari-

ableontoanother. This axiomspeies aproperty dened from K=true, i.e.

∀i, j Ai,j =true.Then, this impliesaoupledequation system.Thebehavior

axiomisdened asfollow:

∀i, j P(Vj|Vi, K=true) =pi,j (1)

Seondly,theinformationontainedinaprobabilitydistributionislinkedto

thedierene betweenthis distribution and theapriori distribution.We then

introdue a seond axiom named "not adding information" whih states that

addingadditionalvariablesdonotbringinformationto thenetwork.Then,this

axiom implies loal onstrains on the P(Ai,j|Vi, Vj), i.e. independently taking intoaounttheAi,j.Thenotaddinginformationaxiomisdenedasfollow :

∀j, P(Vj|K=true) =pj (2)

Let'snowdesribethe iterativeproess that satises theaxioms.Formore

details on the equationsystem induedby the axioms,thereader anrefer to

theappendix attheendofthispaper.

Satisfationmehanismoftheaxiomatisystem: foreahiteration,there

isaninter-dependenyproblemwhenomputingtheprobabilitiesP(Ai,j|Vi, Vj)4.

Indeed,if anelementofthe matrixP(Ai,j|Vi, Vj)is modied, thenthe axioms

maybeinvalidatedforanotherdependeny.Inpratial,wehekthatthesystem

4

(7)

(updating the matrix) until it onverges. This is ahieved thanks to an EM-

inspirediterativealgorithm whih is onernedwith theaxiomsand isdened

asfollow:

stepE:∀i, j qi,j=P(Vj|Vi, K\ {Ai,j}=true);

stepM:omputeP(Ai,j|Vi, Vj)wrt.qi,j.

Atthispoint,thisalgorithmisnotsuienttomakeP(Ai,j|Vi, Vj)onverge.

Thus,wehavetolimittheinuenebetweenvariablesthrough"limitedupdate"

onstraints.Inthefollowing,wepresentthemehanismswhihareneessaryto

thealgorithmthatwill bedesribedinthenextsetion.

Convergene parameter : link "strength" For eah ar between two variables,

we introdue a new term, namely "strength", whih determines the inuene

of onevariable upon another. A zero strength means that thevariable hasno

diret inuene(i.e.sameasremovingtheadditionalvariable).Thestrengthis

expressedbyf,funtiondenedonthesetofadditionalvariablesAi,j.f(Ai,j) = (f1(Ai,j), ..., fmi(Ai,j))is avetorof size mi (numberof modality forthevari-

ableVi),andfk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))where Hk(P(Ai,j|Vi, Vj))isthe

entropyof linek(P(Ai,j|Vi, Vj)isamatrix).

Updating riterion used to onverge : limiting the diret inuene of variables

thankstothe strengthterm. Inordertoomputetheinueneofavariableion

another variablej, wehaveto takeinto aount both thediret inuene (i.e.

throughanadditionalvariableAij)andindiretinuene(i.e.throughtheother

variablesofwhihiandj bothdepend).

For someongurations however,inuenes will ompensate eah other so

thattheywillbothtendtoalimitstate(probabilitywilltendto0or1),making

itdiult totakethem intoaountany further.As amatteroffat, weshall

thenfae(1)possiblyinniteonvergenetowards0or1and(2)omputational

problemrelatedtheomputerauray(thelatterbeingthemostimportantin

pratial).

In order to solve this problem, we ompute a maximum threshold for the

strength whih isdened for everypairsofvariables andfor everymodalityof

thesourevariablesuhas:

Letfk0(i, j) =fk(Ai,j)when∀i, j qi,j=pj.

Thisthresholdismeanttobeusedasthelinkstrengthifthereisnoindiret

inuene. Hene, the iterative algorithm we present in the next setion must

satisfyfor eah step: ∀i, j fk(Ai,j)≤fk0(i, j) (referto algorithm 2in thenext

setion).

4 Representation hange algorithm

Inthissetion,wepresenttwoomplementaryalgorithmsthatperformthede-

(8)

representation hange is performed with respet to the axiomsfor any pair of

variables(i.e.asingleiterationwhihmayormaynotleadtoonvergene).

4.1 Algorithm1 :do N iterations until onvergene

whileP(Ai,j|Vi, Vj)haven'tonverged(distanefromthetermbeforeismore

thanagiventhreshold)orwhilethenumberofiterationshavenotreaheda

maximumdo

allalgorithm 2

omputethedistane betweennewandoldprobabilities

endwhile

4.2 Algorithm2 :do an iteration forall the variablespairs

1: forallpairsofvariablesVi,Vj suhthatthereexistsadependenyVi−> Vj

do

2: if rstiterationthen

3: Setalltheadditionalvariablesasunobserved.

4: Aettheqi,j=P(Vj).

5: else

6: SetthevariableAi,j unobservedandtheother additionalvariablesob-

servedtotrue

7: Calulatetheqi,j=P(Vj|Vi, K\{Ai,j}=true)usinganinfereneinthe

Bayesiannetwork.Theseonditional probabilitiesrepresentsthediret

inuene(withoutthelinkthroughvariableAi,j)ofVi onVj.

8: endif

9: ApplytheequationsoftherstaxiominordertodeterminetheP(Ai,j|Vi, Vj)

withamultiplyonstantforeahline i

10: for all Thelines k of the matrixP(Ai,j|Vi, Vj), aulate the "strength"

fk=fk(Ai,j) = 1−Hk(P(Ai,j|Vi, Vj))ofthelink i−> j.do

11: if Firstiterationthen

12: fk0(i, j) =fk(Ai,j)

13: else

14: if fk > fk0then

15: Calulatebydihotomythe0 ≤y ≤1 suh asfk(Ayi,j))fk0,(i.e.all

the oeients of the matrix are powered by y). This is done in

orderto"smooth"theparameterstoinreasetheentropyandthen

dereasethe"strength".

16: endif

17: endif

18: endfor

19: Apply theequationsof theseondaxiomto determinethemultiply on-

stants

20: ComputethematrixP(Ai,j|Vi, Vj)

(9)

number of examples used for learning. The Y-axis shows the Kullbak-Leibler dis-

tane between the learned joint distribution and the one that was used to generate

thelearningdata.Thegeneratornetworkisshownonthegure(lower-left).Thebest

performingBayesianandattenedBayesiannetworksfor 50examplesare alsoshown

onthegure(up).

21: endfor

Inthenextsetion,weshowsomeexperimentsthatrelyonthisalgorithms.

5 Experiments

5.1 Experimentalsetup

In order to experimentally validate our approah, we onduted some exper-

iments on the learnability of the networks after a representation hange (i.e.

attenedBayesiannetworks).Ourexperimental setupisdenedasfollow:

ageneratornetworkwhihaneitherbeaattenedBayesiannetwork(exp.

1)oralassiBayesiannetwork(exp.2).Inbothexperiments,thenumber

of nodes in the generator and learnable networks is xed (in the ase of

attened Bayesiannetwork,wedo notountthe additionalnodes builtby

(10)

examplesused for learning.TheY-axisshows theKullbak-Leiblerdistanebetween

thelearnedjointdistributionandtheonethatwasusedtogeneratethelearningdata.

Thegeneratornetworkisshownonthegure(lower-left).ThebestperformingBayesian

andattenedBayesiannetworksfor 50examplesarealsoshownonthegure(up).

asetoflearningnetworksthatoversbothall thepossiblelassiBayesian

networksandattenedBayesiannetworksstrutureswiththesamenumber

ofnodesthanthegenerator(i.e.learningisexhaustiveforallstrutureswith

agivensize).

Soastogetagoodapproximationoftheresults,weomputeNdatasequene

fromM randominitializationsforthegeneratornetwork.Asaonsequene,we performN∗M learningsessionsforeahtargetnetwork(20≤N∗M ≤50).

TheerrorisdenedastheKullbak-Leiblerdistanebetweenthejointdistri-

butionof agiventargetnetworkandthedistributionof thegeneratornetwork.

Inthesopeofthispaper,thenetworksizeforallexperimentsislimitedto4so

that it ispossibleto evaluatethe performane for allpossiblestrutures. Asa

matteroffat,thenumberofpossiblestruturesgrowsmorethanexponentially

(11)

Bayesian network

Firstly,westudythebehaviorofattenedBayesiannetworksinthemostfavor-

ablesetup,i.e.whenlearningondatageneratedbyaattenedBayesiannetwork.

Inthisexperiment,thegeneratorisa4-nodeyliattenedBayesiannetwork.

Figure 2showsthis generatoraswellastheresultsobtainedwith bothallthe

attenedBayesiannetworksandlassiBayesiannetworkthatontains4nodes.

ThisgureshowsthattheattenedBayesiannetworksalwaysperformbetter

foraverageand bestperformanes.However,learningperformane tendsto be

the same as the number of examples inreases (≥ 250). Flattened Bayesian

networks are thus relevant when learning from suh data. Moreoverit should

be noted that the best performing attened Bayesian network is struturaly

dierentfromthegenerator,meaningthatthemorereliablestruturewhenfew

examplesareavailableisnottheverystrutureofthegenerator.

5.3 Experiment2: Learning from fewexamples

Seondly,wehoosea4-nodelassiBayesiannetworkasdatagenerator(f.g.

3).Asaonsequene,learningwithattenedBayesiannetworksfaestheworst

asesinethegenerator'sjointprobabilityanbeanything.Asamatteroffat,

attened Bayesian network are supposed to be better for some distributions

(unknownatthisstageof ourresearh).

Figure3showstheresultswithrespetto theexperimental setupdesribed

earlier. Theimportantresult isthat the attened Bayesiannetworksshowthe

best results bothin average and for the best when there are few examples to

learnfrom.However,lassiBayesiannetworksbeomebetterasthenumberof

examplesgrow.Theseresultsshowlearlythat attenedBayesiannetwork pay

fortheadvantageoflearningspeedwithalossinaurayinthelongterm(i.e.

ompromisebetweena fast learningurve againtnon-aurate learning in the

longterm).

5.4 Disussion

Aordingtotheresultsobtainedearlier,itappearsthat thebest networksare

also thesimplestones.Thus,it seemsmorerelevantto learnwithasimpleyet

inadequate struture ratherthan with amoreomplexstruture that isloser

tothegenerator:thisanbeseenasanexplanationforthegoodlearningapa-

bilities ofattenedBayesiannetworks. Figure4tendsto onrmthisassertion

byshowingthedistributionoflassiandattenedBayesiannetworksaording

thelearningperformaneforagivennumberofexamples(herearbitrarilyxed

to50)inexperiment2.IndeedthisgureshowsthatattenedBayesiannetwork

Références

Documents relatifs

This paper proposes to estimate the mixture coefficients of the Normal Compositional Model (referred to as abundances) as well as their number using a reversible jump

In this paper, we discuss efforts to apply a novel Bayesian network (BN) structure learning algorithm to a real world epidemiologi- cal problem, namely the Nasopharyngeal

Another line of research is to formulate and run ac- tive inference for dynamic Bayesian networks [6, 5, 4, 12, 11]. Active inference is interested in the following question: if we

Taking action detection in soccer videos as a use case for multimodal event detection, we have shown how structure learning in Bayesian networks, associated with the adequate

A novel Bayesian Network structure learning algorithm based on minimal correlated itemset mining techniques.. Zahra Kebaili,

This method called Parallel Bayesian Neural Network-assisted Genetic Algorithm (BNN-GA) and whose structure is presented in Fig.1., has been employed with success to solve

(2011) Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis.. This is an open-access article distributed under

Besides, data-driven Bayesian network (BN) has shown to solve the problem in historical data, which is usually available, unlike expensive, and insufficient, expert knowledge. This