Ba yesian netw orks
Chapter14.1–3
Chapter14.1–31
Outline
♦Syntax
♦Semantics
♦Parameterizeddistributions
Chapter14.1–32
Ba y esian net w orks
Asimple,graphicalnotationforconditionalindependenceassertionsandhenceforcompactspecificationoffulljointdistributions
Syntax:asetofnodes,onepervariableadirected,acyclicgraph(link≈“directlyinfluences”)aconditionaldistributionforeachnodegivenitsparents:P(Xi|Parents(Xi)) Inthesimplestcase,conditionaldistributionrepresentedasaconditionalprobabilitytable(CPT)givingthedistributionoverXiforeachcombinationofparentvalues
Chapter14.1–33
Example
Topologyofnetworkencodesconditionalindependenceassertions:
WeatherCavity
ToothacheCatch
Weatherisindependentoftheothervariables
ToothacheandCatchareconditionallyindependentgivenCavity
Chapter14.1–34
Example
I’matwork,neighborJohncallstosaymyalarmisringing,butneighborMarydoesn’tcall.Sometimesit’ssetoffbyminorearthquakes.Isthereaburglar?
Variables:Burglar,Earthquake,Alarm,JohnCalls,MaryCallsNetworktopologyreflects“causal”knowledge:–Aburglarcansetthealarmoff–Anearthquakecansetthealarmoff–ThealarmcancauseMarytocall–ThealarmcancauseJohntocall
Chapter14.1–35
Example con td.
.001 P(B)
.002 P(E)
Alarm Earthquake
MaryCallsJohnCalls Burglary
BTTFF ETFTF .95.29.001 .94 P(A|B,E)
ATF .90.05 P(J|A)ATF .70.01 P(M|A)
Chapter14.1–36
Compactness
ACPTforBooleanXiwithkBooleanparentshasBE
J A
M 2krowsforthecombinationsofparentvalues EachrowrequiresonenumberpforXi=true(thenumberforXi=falseisjust1−p) Ifeachvariablehasnomorethankparents,thecompletenetworkrequiresO(n·2k)numbers I.e.,growslinearlywithn,vs.O(2n)forthefulljointdistribution Forburglarynet,1+1+4+2+2=10numbers(vs.25−1=31)
Chapter14.1–37
Global seman tics
GlobalsemanticsdefinesthefulljointdistributionBE
J A
M astheproductofthelocalconditionaldistributions:
P(x1,...,xn)=
Π
ni=1P(xi|parents(Xi))e.g.,P(j∧m∧a∧¬b∧¬e)
=
Chapter14.1–38
Global seman tics
“Global”semanticsdefinesthefulljointdistributionBE
J A
M astheproductofthelocalconditionaldistributions:
P(x1,...,xn)=
Π
ni=1P(xi|parents(Xi))e.g.,P(j∧m∧a∧¬b∧¬e)
=P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)=0.9×0.7×0.001×0.999×0.998≈0.00063
Chapter14.1–39
Lo cal seman tics
Localsemantics:eachnodeisconditionallyindependentofitsnondescendantsgivenitsparents
. . . . . .U1
X Um
Yn Znj
Y1 Z1j
Theorem:Localsemantics⇔globalsemantics
Chapter14.1–310
Mark o v blank et
EachnodeisconditionallyindependentofallothersgivenitsMarkovblanket:parents+children+children’sparents
. . . . . .U1
X Um
Yn Znj
Y1 Z1j
Chapter14.1–311
Constructing Ba y esian net w orks
Needamethodsuchthataseriesoflocallytestableassertionsofconditionalindependenceguaranteestherequiredglobalsemantics
1.ChooseanorderingofvariablesX1,...,Xn2.Fori=1tonaddXitothenetworkselectparentsfromX1,...,Xi−1suchthatP(Xi|Parents(Xi))=P(Xi|X1,...,Xi−1)
Thischoiceofparentsguaranteestheglobalsemantics:
P(X1,...,Xn)=
Π
ni=1P(Xi|X1,...,Xi−1)(chainrule)=Π
ni=1P(Xi|Parents(Xi))(byconstruction)Chapter14.1–312
Example
SupposewechoosetheorderingM,J,A,B,E
MaryCalls
JohnCalls
P(J|M)=P(J)?
Chapter14.1–313
Example
SupposewechoosetheorderingM,J,A,B,E
MaryCalls
Alarm JohnCalls
P(J|M)=P(J)?NoP(A|J,M)=P(A|J)?P(A|J,M)=P(A)?
Chapter14.1–314
Example
SupposewechoosetheorderingM,J,A,B,E
MaryCalls
Alarm
Burglary JohnCalls
P(J|M)=P(J)?NoP(A|J,M)=P(A|J)?P(A|J,M)=P(A)?NoP(B|A,J,M)=P(B|A)?P(B|A,J,M)=P(B)?
Chapter14.1–315
Example
SupposewechoosetheorderingM,J,A,B,E
MaryCalls
Alarm
Burglary
Earthquake JohnCalls
P(J|M)=P(J)?NoP(A|J,M)=P(A|J)?P(A|J,M)=P(A)?NoP(B|A,J,M)=P(B|A)?YesP(B|A,J,M)=P(B)?NoP(E|B,A,J,M)=P(E|A)?P(E|B,A,J,M)=P(E|A,B)?
Chapter14.1–316
Example
SupposewechoosetheorderingM,J,A,B,E
MaryCalls
Alarm
Burglary
Earthquake JohnCalls
P(J|M)=P(J)?NoP(A|J,M)=P(A|J)?P(A|J,M)=P(A)?NoP(B|A,J,M)=P(B|A)?YesP(B|A,J,M)=P(B)?NoP(E|B,A,J,M)=P(E|A)?NoP(E|B,A,J,M)=P(E|A,B)?Yes
Chapter14.1–317
Example con td.
MaryCalls
Alarm
Burglary
Earthquake JohnCalls
Decidingconditionalindependenceishardinnoncausaldirections
(Causalmodelsandconditionalindependenceseemhardwiredforhumans!)
Assessingconditionalprobabilitiesishardinnoncausaldirections
Networkislesscompact:1+2+4+2+4=13numbersneeded
Chapter14.1–318
Example: Car diagnosis
Initialevidence:carwon’tstartTestablevariables(green),“broken,sofixit”variables(orange)Hiddenvariables(gray)ensuresparsestructure,reduceparameters
lights no oilno gasstarterbroken battery agealternator broken fanbeltbroken
battery deadno charging
battery flat
gas gauge fuel lineblocked
oil light battery meter
car won’t startdipstick
Chapter14.1–319
Example: Car insurance
SocioEconAgeGoodStudent
ExtraCarMileage
VehicleYear RiskAversion
SeniorTrain
DrivingSkillMakeModel
DrivingHist
DrivQuality Antilock
AirbagCarValueHomeBaseAntiTheft
Theft
OwnDamage
PropertyCostLiabilityCostMedicalCost Cushioning RuggednessAccident OtherCostOwnCost
Chapter14.1–320
Compact conditional distributions
CPTgrowsexponentiallywithnumberofparentsCPTbecomesinfinitewithcontinuous-valuedparentorchild
Solution:canonicaldistributionsthataredefinedcompactly
Deterministicnodesarethesimplestcase:X=f(Parents(X))forsomefunctionf
E.g.,BooleanfunctionsNorthAmerican⇔Canadian∨US∨Mexican
E.g.,numericalrelationshipsamongcontinuousvariables
∂Level∂t =inflow+precipitation-outflow-evaporation
Chapter14.1–321
Compact conditional distributions con td.
Noisy-ORdistributionsmodelmultiplenoninteractingcauses1)ParentsU1...Ukincludeallcauses(canaddleaknode)2)Independentfailureprobabilityqiforeachcausealone⇒P(X|U1...Uj,¬Uj+1...¬Uk)=1−
Π
ji=1qiColdFluMalariaP(Fever)P(¬Fever)FFF0.01.0FFT0.90.1FTF0.80.2FTT0.980.02=0.2×0.1TFF0.40.6TFT0.940.06=0.6×0.1TTF0.880.12=0.6×0.2TTT0.9880.012=0.6×0.2×0.1
Numberofparameterslinearinnumberofparents
Chapter14.1–322
Hybrid (discrete+con tin uous) net w orks
Discrete(Subsidy?andBuys?);continuous(HarvestandCost)
Buys? Harvest Subsidy?
Cost
Option1:discretization—possiblylargeerrors,largeCPTsOption2:finitelyparameterizedcanonicalfamilies
1)Continuousvariable,discrete+continuousparents(e.g.,Cost)2)Discretevariable,continuousparents(e.g.,Buys?)
Chapter14.1–323
Con tin uous child v ariables
Needoneconditionaldensityfunctionforchildvariablegivencontinuousparents,foreachpossibleassignmenttodiscreteparents
MostcommonisthelinearGaussianmodel,e.g.,:
P(Cost=c|Harvest=h,Subsidy?=true)=N(ath+bt,σt)(c)
= 1σt √2π exp − 12 c−(ath+bt)σt 2
MeanCostvarieslinearlywithHarvest,varianceisfixed
LinearvariationisunreasonableoverthefullrangebutworksOKifthelikelyrangeofHarvestisnarrow
Chapter14.1–324
Con tin uous child v ariables
05100 5 10 00.050.10.15 0.20.250.30.35
Cost Harvest P(Cost|Harvest,Subsidy?=true)
All-continuousnetworkwithLGdistributions⇒fulljointdistributionisamultivariateGaussian
Discrete+continuousLGnetworkisaconditionalGaussiannetworki.e.,amultivariateGaussianoverallcontinuousvariablesforeachcombinationofdiscretevariablevalues
Chapter14.1–325
Discrete v ariable w/ con tin uous paren ts
ProbabilityofBuys?givenCostshouldbea“soft”threshold:
0 0.2 0.4 0.6 0.8 1
024681012
P(Buys?=false|Cost=c)
Cost c
ProbitdistributionusesintegralofGaussian:Φ(x)= Rx−∞N(0,1)(x)dxP(Buys?=true|Cost=c)=Φ((−c+µ)/σ)
Chapter14.1–326
Wh y the probit?
1.It’ssortoftherightshape
2.Canviewashardthresholdwhoselocationissubjecttonoise
Buys? CostCostNoise
Chapter14.1–327
Discrete v ariable con td.
Sigmoid(orlogit)distributionalsousedinneuralnetworks:
P(Buys?=true|Cost=c)= 11+exp(−2−c+µσ)
Sigmoidhassimilarshapetoprobitbutmuchlongertails:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
024681012
P(Buys?=false|Cost=c)
Cost c
Chapter14.1–328
Summary
Bayesnetsprovideanaturalrepresentationfor(causallyinduced)conditionalindependence
Topology+CPTs=compactrepresentationofjointdistribution
Generallyeasyfor(non)expertstoconstruct
Canonicaldistributions(e.g.,noisy-OR)=compactrepresentationofCPTs
Continuousvariables⇒parameterizeddistributions(e.g.,linearGaussian)
Chapter14.1–329