HAL Id: hal-02378370
https://hal.archives-ouvertes.fr/hal-02378370
Submitted on 25 Nov 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Solving sequential collective decision problems under
qualitative uncertainty
Nahla Ben Amor, Fatma Essghaier, Hélène Fargier
To cite this version:
Nahla Ben Amor, Fatma Essghaier, Hélène Fargier. Solving sequential collective decision problems
under qualitative uncertainty. International Journal of Approximate Reasoning, Elsevier, 2019, 109,
pp.1-18. �10.1016/j.ijar.2019.03.003�. �hal-02378370�
Any correspondence concerning this service should be sent
to the repository administrator:
tech-oatao@listes-diff.inp-toulouse.fr
This is an author’s version published in:
http://oatao.univ-toulouse.fr/25042
To cite this version:
Ben Amor, Nahla and Essghaier,
Fatma and Fargier, Hélène Solving sequential collective
decision problems under qualitative uncertainty. (2019)
International Journal of Approximate Reasoning, 109. 1-18.
ISSN 0888-613X
Official URL
DOI :
https://doi.org/10.1016/j.ijar.2019.03.003
Open Archive Toulouse Archive Ouverte
OATAO is an open access repository that collects the work of Toulouse
researchers and makes it freely available over the web where possible
Solving
sequential
collective
decision
problems
under
qualitative
uncertainty
✩
Nahla
Ben Amor
a,
Fatma
Essghaier
a,b,
Hélène
Fargier
ba LARODEC,UniversityofTunis,Tunisia
b IRIT,UPS-CNRS,118routedeNarbonne,31062Toulouse,France
a b s t r a c t
Keywords: Possibilitytheory Qualitativeuncertainty Collectivedecisionmaking Sequentialdecisionmaking Decisiontree
DynamicProgramming
This paper addresses the question of sequential collective decision making under qualitative uncertainty. It resumes the criteria introduced in previous works [4–6] by Ben Amor et al. and extendsthemtoamoregeneralcontextwhereeverydecisionmakerisfreetohavean optimisticor a pessimistic attitude w.r.t.uncertainty. These criteria are then considered for the optimization of possibilisticdecisiontreesandanalgorithmicstudyisperformedfor each ofthem. Whentheglobal utilitydoessatisfythemonotonicityproperty,aclassical possibilistic Dynamic Programming can be applied. Otherwise, two cases are possible: either the criterionis maxoriented (the more isthe satisfaction of any agent, the greater is the global satisfaction), and a dedicated algorithm can be proposed, thatrelies onas many callsto DynamicProgramming as thenumberofdecisionmakers; or the criterion is min oriented (all the agents must like the common decision) and the optimal strategycanbe providedby aBranchandBoundAlgorithm.Thepaperconcludesbyanexperimental study that shows the feasibility of the approaches, and details to what extent simple Dynamic programmingalgorithmscanbeusedasapproximationproceduresforthenonmonotonic criteria.
1. Introduction
The handlingofacollective decisionproblemunder uncertaintyresorts on(i)theidentificationof atheoryofdecision making under uncertainty (DMU) that captures the decision makers’ behavior with respect to uncertainty and (ii) the specificationofacollectiveutilityfunction(CUF)asitmaybeusedwhentheproblemisnotpervadedwithuncertainty.But also,oneneedstoprecisewhentheutilityoftheagentsistobeevaluated:before(ex-ante)orafter(ex-post)therealization oftheuncertainevents.Inthefirstcase,theglobalutilityfunctionisafunctionoftheDMUutilitiesofthedifferentagents; inthesecondcaseitisanaggregation,w.r.t.thelikelihoodofthefinalstates,ofthecollectiveutilities.
Following Fleming [23], Harsanyi [27] has shown that, when the uncertainty about consequences of decisions can be quantifiedinaprobabilisticway:thecollectiveutilityshouldbeaweightedsumoftheindividualexpected utilities.Many contributions have been inspired by this seminal work: Some authors (such as Diamond [11]) criticized this approach becausenotapplicablewhenthecollectiveutilityismoreegalitarianthanutilitarian.Othershavedeveloped Harsanyi’s
ap-✩ Thispaperisanextendedversionofpreliminaryresultspresentedin[7];itincludesthefullproofsofthepropositions,andnewmodelsandalgorithms
thatallowthemodelingofcollectivesequentialproblemswheretheagentshavedifferentattitudesw.r.t.uncertainty. E-mailaddresses: nahla.benamor@gmx.fr (N. Ben Amor),essghaier.fatma@gmail.com (F. Essghaier),fargier@irit.fr (H. Fargier). https://doi.org/10.1016/j.ijar.2019.03.003
proach, inparticularMyerson [37] who provedthat onlythechoiceofanutilitariansocialwelfarefunctioncanreconciliate the ex-ante andex-post approaches. However in theprobabilistic case,all other welfare functions sufferfrom the “timing effect” [37],i.e.,leadtoadiscrepancybetweentheex-ante andtheex-post approach.
Harsanyi’s and Meyrson’s results rely on the assumption that the knowledge ofthe agents about the consequences of their decisionsisrich enoughto bemodeledbyprobabilisticlotteries.When theinformation aboutuncertainty cannotbe quantifiedinaprobabilisticway,thetopicofpossibilisticdecisiontheoryisoftenanaturalonetoconsider[12–14,18,21,25]. Qualitativedecisiontheoryisrelevant,among otherfields,forapplicationsto planningunderuncertainty,whereasuitable
strategy (i.e., a set of conditional or unconditionaldecisions) isto be found, starting from a qualitativedescription of the initial world, of theavailable alternatives,oftheir (perhapsuncertain)effects and ofthegoal to reach(see [8,41,43]).But uptothispoint,theevaluationofthestrategieswas consideredonlyinasimple,mono-agentcontext,whileitisoftenthe casethatseveral agentsareinvolvedinthedecision.
Thepresent paperraises thequestionofsequentialcollective decisionmakingunderpossibilisticuncertainty. Itfollows recent works [4–6] which propose atheoretical frameworkfor multi-agent (nonsequential) decisionmaking under possi-bilistic uncertainty. It extends these results to consider decision problemswhere agents may have different attitudew.r.t. uncertainty(somemaybeoptimisticwhentheothersarepessimistic)andprovidesnewdecisionrulesforthisspecificcase. Then, wetackletheproblemof(possibilistic)sequential decisionmakingand weprovideanalgorithmicstudy forstrategy optimization incollectivepossibilisticdecisiontrees.
The remainder of this paper is organized as follows: The next Section recalls the basic notions on which our work relies(decisionunderpossibilisticuncertainty,collectiveutilityfunctions,etc.).InSection3,weresumethedecisioncriteria introduced in [4] and define new ones for agents with different attitudes. Section 4 is devoted to strategy optimization in collectivepossibilistic decisiontrees,usingthe decisionrulespreviously defined.Thepictureis finallycompletedbyan experimental study,presentedinSection5.
2. Backgroundandbasicnotions
2.1. Collectiveutilityfunctions
LetusconsiderasetA= {1,. . . ,p} ofagentsthat havetomakeadecision.Each agenti∈ Abeingsupposed toexpress his / her preferences on a set of alternatives (say, aset X), bya ranking functionor a utilityfunction ui that associates
to each element of X a value ina subsetof
R
+ (typically in theunitinterval [0,1]). In theabsence ofuncertainty, each decision leads to a uniqueconsequence and an utilityvector Eu= hu1,. . . ,upi is associated to eachone of them. Besides,when agents are not equally important, we define a vector wE = hw1,. . . ,wpi where each i is equipped with a weight
wi∈ [0,1]reflectingitsimportance.Thus,solvingtheproblemcomes downtocomputeaglobalutilitydegreethat reflects
thecollectivepreferencebyaggregatingthedifferent ui’s.
In a qualitativeframework, such aggregation shall beeither conjunctive (i.e.,based on a weighted min) or disjunctive (i.e., basedon aweighted max)- see [15] formoredetailsabout weighted minand weighted maxaggregations.Formally, these aggregationsaredefinedasfollows:
Disjunctive aggregation : Aggmax
(
x) =
maxi∈A min
(
wi,
ui(
x)).
(1)Conjunctive aggregation : Aggmin
(
x) =
mini∈A max
(
1−
wi,
ui(
x)).
(2)2.2. Multi-agentdecisionmakingunderrisk
Inaframeworkofdecisionmakingunderrisk,whentheinformationabouttheconsequencesofdecisionsisprobabilistic, a popular criterion to compare alternatives is theexpected utilitymodel axiomatized by Von Neumannand Morgenstern [35]: anelementary decision is modeled bya probability distribution over theset X of possible outcomes. It iscalled a simpleprobabilisticlotteryanditisdenotedbyL= hλ1/x1,. . . ,λn/xni,whereλj=p(xj)istheprobabilitythatthedecision
leadstooutcomexj.Also,itissupposedthatthepreferencesofasingledecisionmakerarecapturedbyautilityfunctionuj
assigning anumerical valueto eachxj.Solvingsuch problemsamountstoevaluate riskyalternatives and choosingamong
them. In otherwords, we compute theexpected utilityofeach lottery and weselect theone withthe highest value(the greater, thebetter).
When several agents are involved, theaggregation ofindividualpreferences under risk raises aparticular problem de-pending on whenthe utilityof theagents is to beevaluated, before or after theconsideration of uncertainty. This yields two differentapproaches,namelytheex-ante andtheex-post aggregation:
• The ex-ante approach consists in computing the utility of each agent, before performing the aggregation using the agents’weights.
• The ex-post approach consists in first determining the aggregated utility (conjunctive or disjunctive) relative to each possibleoutcomeofX;thenconsidertheuncertaintyandthelikelihood ofstates.
For thesamedecision problemand usingthesamecriterion,thetwo approachesdonot alwayscoincide:Aggregating the agents attitudes before or after theconsideration of uncertainty maylead to different conclusions.This phenomenon has been identifiedbyMyerson [37] as “timingeffect”.
2.3. Singleagentdecisionmakingunderpossibilisticuncertainty
The expected utilitymodel owes itspopularity essentiallyto itsstrong axiomatic justifications [35,44].However, it in-volves only the use of a quantitative representation of uncertainty and it has been proved that this formalism cannot represent alldecisionmakers’behaviors [1,22].Besides,whenthedecisionmakerisunabletoexpresshis/heruncertainty and preferences numericallyand can onlygivean orderamong differentalternatives,theprobabilistic frameworkremains inappropriateandnonprobabilisticmodels(suchasimpreciseprobabilities[46],evidencetheory[45],roughsettheory[39] and possibility theory [16,19,47,48]) become relevant alternatives. In particular, one may consider qualitative possibilistic decisionrulesthathaveemergedwiththegrowth ofpossibilitytheory.
Possibilitytheoryisaframeworktohandle uncertaintyissued fromFuzzy Setstheory. Ithas beenintroducedbyZadeh [47,48] and further developed by Dubois and Prade [16,17]. The basic building block in this framework is the notion of
possibility distribution. Itrepresents the knowledgeof adecision maker about the stateof theworld. Apossibility distri-bution is denoted by
π
and it maps each state s in the universe of discourse S to a bounded linearly ordered scale V ,typicallytheunitinterval[0,1].Thisscalecanbeinterpretedinaquantitativeoraqualitativeway,thelatteristhecontext of ourwork. Independentlyofthe used scale,given s∈S,
π
(s)=1 meansthat therealization of s is totally possibleandπ
(s)=0 means that s is impossible.π
is said to benormalized if there exist at least one s∈S that is totally possible. Extremecasesofknowledgeinpossibilitytheoryarecompleteknowledgeandtotal ignorance.Inthefirstcase,weassign1 toatotallypossiblestates0 and0otherwise(i.e.,∃s0,π
(s0)=1and∀s6=s0,π
(s)=0).Inthesecondone,weassign1toallsituations(i.e.,
π
(s)=1, ∀s∈S).Possibility theory is characterized by the useof two dual measures, namely the possibility measure 5 and necessity measure N definedby:
• Possibilitymeasure: 5(E)=max
s∈E
π
(s). It denotes the possibility degree evaluating at which level an event E⊆S isconsistentwiththeknowledgerepresentedby
π
.• Necessitymeasure: N(E)=1− 5(¬E)=min
s∈E (1−
π
(s)). Itdenotes the necessitydegreeevaluatingatwhich level anevent E⊆S iscertainlyimpliedbytheknowledge.
FollowingDuboisand Prade’spossibilisticapproachofdecisionmakingunderqualitativeuncertainty[18],adecisioncan beseen asapossibility distributionover afiniteset ofoutcomesX calleda(simple)possibilisticlottery.Sucha lotteryis denoted byL= hλ1/x1,. . . ,λn/xniwhere λj=
π
L(xj) isthepossibilitythat decisionL leads tooutcome xj;thispossibilitydegree canalso bedenoted by L[xj]. Inthis framework,a decisionproblem isthus fully specifiedbya set ofpossibilistic
lotterieson X and autilityfunctionu: X 7→ [0,1] expressing thedecisionmaker preferences.Under theassumption that theutilityscaleandthepossibilityscalearecommensurateandpurelyordinal,Duboisand Pradeproposetoevaluate each lottery byaqualitative,optimisticorpessimisticutility[18]:
Optimistic utility : U+
(
L) =
maxxj∈X
min
(
L[
xj],
u(
xj)).
(3)Pessimistic utility : U−
(
L) =
minxj∈X
max
(
1−
L[
xj],
u(
xj)).
(4)U+(L)isamildversionofthemaximaxcriterion:L isgoodassoonasitistotallyplausiblethatitgivesagoodconsequence. On thecontrary, thepessimisticutility,U−(L) estimatestheutilityofanactbyitsworstpossibleconsequence:itsvalueis high wheneverLgives good consequences in every “ratherplausible”state. Thesetwo utilities canbeseen as theordinal counterpartoftheexpected utility criterionand havebeenaxiomatizedinthestyleofVonNeumannandMorgenstern[18] and Savage[20] frameworks.
3. Apossibilisticapproachtocollectivedecisionmaking
3.1. Collectivequalitativedecisionrules
Let us now consider collective decision making under qualitativeuncertainty: In this framework, the decisionmaker’s attitude with respectto uncertainty canbe eitheroptimistic (U+) orpessimistic (U−)and theaggregation ofthe agents’ preferencescanbeeitherconjunctive,egalitarian( Aggmin)ordisjunctive,nonegalitarian( Aggmax).
Consider the decision problem defined by A a set of agents, Eu= hu1,. . . ,upi a vector of utility functions, wE =
hw1,. . . ,wpi a weighting vector and L = hL[x1]/Eu(x1), . . ., L[xn]/Eu(xn)i a possibilistic lottery on a set of consequences
X. Ben Amor et al. [4–6] have proposed and axiomatized four ex-ante and four ex-post decision criteria to solve such problems.
Fig. 1. Possibilistic (constant) lottery of Example1.
Uante−min
(
L) =
mini∈A max
((
1−
wi),
xminj∈Xmax
(
ui(
xj), (
1−
L[
xj]))).
(5)U−antemax
(
L) =
maxi∈A min
(
wi,
xminj∈Xmax
(
ui(
xj), (
1−
L[
xj]))).
(6)Uante+min
(
L) =
mini∈A max
((
1−
wi),
maxxj∈Xmin
(
ui(
xj),
L[
xj])).
(7)U+antemax
(
L) =
maxi∈A min
(
wi,
maxxj∈Xmin
(
ui(
xj),
L[
xj])).
(8)U−postmin
(
L) =
minxj∈X
max
((
1−
L[
xj]),
mini∈A max
(
ui(
xj),
(
1−
wi))).
(9)U−postmax
(
L) =
minxj∈X
max
((
1−
L[
xj]),
maxi∈A min
(
ui(
xj),
wi)).
(10)U+postmin
(
L) =
maxxj∈X
min
(
L[
xj],
mini∈A max
(
ui(
xj),
(
1−
wi))).
(11)U+postmax
(
L) =
maxxj∈X
min
(
L[
xj],
maxi∈A min
(
ui(
xj),
wi)).
(12)For the notations: thesubscript indicates theused approach(ex-ante or ex-post) and the superscriptdenotes thedecision makers’ attitude w.r.t. uncertainty (pessimistic “-” oroptimistic “+”) and the agents’ preferences aggregation (conjunctive “min” ordisjunctive“max”).
The Uante−min utilityfor instance, considersthat the decisionmakers arepessimistic and computesthepessimistic utility ofeachone ofthem.Then, theU−i ’sareaggregatedon acautiousbasis:thehigherthesatisfactionoftheleast satisfiedof theimportantagents,thebetteristhelottery.Usingthesamenotations,U−maxpost considersthataconsequence xi isgoodas
soon asone of theimportant agents issatisfied: amax-based aggregation of theutilities isperformed, yieldinga unique utilityfunction Ag g()on thebasisofwhichthepessimisticutilityiscomputed.
In Ref. [6], authorshave proposed a qualitativecounterpart of Harsanyi’s theorem [27],and have shown that the fully min orientedand fullymax orientedex-ante utilities areequivalentto theirex-post counterparts,i.e., Uante−min=U−postmin and
U+antemax=U+postmax. But Uante−max (resp. Uante+min) may differfrom U−postmax (resp. from U+minpost ).These criteria sufferfrom timing effect.
Example1.Considertwoequallyimportantagents1 and2 (w1=w2=1),andalottery L= h1/xa,1/xbi definingastateof
total ignoranceabout consequencesxa and xb (
π
(xa)=π
(xb)=1) (seeFig.1). Thefirstconsequence isgoodfor1 and badfor 2,and thesecondoneisbadfor1 andgoodfor 2:u1(xa)=u2(xb)=1 and u2(xa)=u1(xb)=0.
Itiseasytocheckthat Uante+min(L)=06=U+minpost (L)=1 where:
U+postmin
(
L) =
max(
min(
1,
min(
max(
1−
1,
1),
max(
1−
1,
0))) ,
min
(
1,
min(
max(
1−
1,
0),
max(
1−
1,
1)))) =
0.
U+antemin
(
L) =
min(
max(
1−
1,
max(
min(
1,
1),
min(
1,
0))) ,
max
(
1−
1,
max(
min(
1,
0),
min(
1,
1)))) =
1.
3.2. Possibilisticdecisionrulesforagentswithdifferentattitudew.r.t.touncertainty
The decision rules presented in theprevious section assume that all theagents are either purely optimisticor purely pessimistic.However,theattitudeofeachdecisionmakerhasamajorimpactonthechosenalternativeordecision.Forthe samedecisionproblemonlyvaryingtheattitudeofadecisionmakermayradicallychangetheresults.Besides,inreal-world problems,itisseldomthatallthedecisionmakershavesameattitude:thegroupofdecisionmakersmaygatherpessimistic as wellasoptimisticpersons. So, thedecision haveto bemade byconsidering theindividualdifferences in toleranceand intolerance foruncertainty.In thefollowing,wegetridof theassumptionofthesameattitudeforthecollectivity and we propose moregeneraldecisionrulesthat areappropriate tohandlesituations whereeachagentisfreeto expresshis/her attitudew.r.t.uncertainty.
Nevertheless, dealing withsuch problems imposesthe useof ex-ante aggregation and forces usto give upthe ex-post
agent representing thecollectivity),imposes the useofoneand only oneattitude.Hence, itisnotpossible torespecteach agent’s attitudetowardsuncertaintywithasuchmethod.Theex-ante approach,onthecontrary,allowsthehandlingofthe differentattitudesofheterogeneousagents.Thisleadsustoextend theex-ante decisionrulesasfollows:
Definition1. Given a possibilistic lottery L on X, a set of agents A (where each i is either optimistic or pessimistic), a vectorofutilityfunctionsu andE aweightingvector w,E let:
Uantemax
(
L) =
maxi∈Amin
(
wi, ⊗
xj∈X⊕ (
ui(
xj),
3[
xj])).
(13)Uantemin
(
L) =
mini∈Amax
(
1−
wi, ⊗
xj∈X⊕ (
ui(
xj),
3[
xj])),
(14)where ⊗ = min (resp. max), ⊕ = max (resp. min) and 3[xj] = 1 - L[xj] (resp. L[xj]) if the agent i is pessimistic (resp.
optimistic).
Example2. Consider two agents 1 and 2 having thesame importance (w1 = w2 = 1) such that 1 is optimistic and 2 is
pessimistic,and considerthelotteryL definedinExample1:L= h1/h1,0i,1/h0,1ii.Wegetthefollowingresults:
Uantemax
(
L) =
max(
min(
w1,
U1+(
L)),
min(
w2,
U2−(
L))).
=
max(
min(
1,
max(
min(
1,
1),
min(
1,
0))) ,
min
(
1,
min(
max(
1−
1,
0),
max(
1−
1,
1)))) =
1.
Uantemin
(
L) =
min(
max(
1−
w1,
U1+(
L)),
max(
1−
w2,
U−2(
L))).
=
min(
max(
1−
1,
max(
min(
1,
1),
min(
1,
0))) ,
max
(
1−
1,
min(
max(
1−
1,
0),
max(
1−
1,
1)))) =
0.
These criteria can be considered as a generalization of the four ex-ante utilities: Using the min (resp. max) oriented aggregation, Uantemin (resp.Umaxante)isequal toUante−min (resp.Uante−max)if allagentsarepessimistic and itisequalto Uante+min (resp.
U+antemax)ifthereareoptimistic.Obviously,theegalitarianutilityUantemin isrelatedtoitsnonegalitariancounterpartbyduality. Formally, itholdsthat:
Proposition1.LetP = hL,wE,uEibeaqualitativecollectivedecisionproblemandlet Pτ = hL,wE,uEτibeitsdualproblem,i.e.,the problemsuchthatforeachagentweconsiderhis/herdualattitudeandforanyxj∈ X,i∈ A,wedefineuτi(xj)=1−ui(xj).Then,
foranyL∈ L:
Uanteτmax
(
L) =
1−
Uantemin(
L)
and,
Uanteτmin
(
L) =
1−
Uantemax(
L).
Likewise,ifagentsareequallyimportant,weightscanberuledoutandtheproposedcriteriacanbesimplifiedasfollows:
Definition2.Givenapossibilisticlottery L onX,asetofequallyimportantagentsAwhereeachagentiseitheroptimistic orpessimisticandavectorofutilityfunctions u let:E
Uantemax
(
L) =
maxi∈A
( ⊗
xj∈X⊕ (
ui(
xj),
3[
xj])).
(15)Uantemin
(
L) =
mini∈A
( ⊗
xj∈X⊕ (
ui(
xj),
3[
xj])),
(16)where ⊗ = min (resp. max), ⊕ = max (resp. min) and 3[xj] = 1 - L[xj] (resp. L[xj]) if the agent i is pessimistic (resp.
optimistic).
4. Collectivesequentialdecisionmakingunderqualitativeuncertainty
4.1. Definitionofcollectivepossibilisticdecisiontrees
Representationformalisms suchasdecisiontrees[40],influencediagrams [28] andMarkovdecision process[3],offer a clear description ofsequential decision problemsand allow thedefinition ofoptimalstrategies. Inrecent years, therehas been agrowinginterestinmorecomplex problems(namelymulti-criteriaormulti-objectivesdecisionmaking)and several extensions ofclassical graphicalmodels[9,26,32,33] have emergedtopresent suchcases.Theseresearchproposalsrely on
the probability theoryand the wellknown expected utilitycriterionto solve theproblem. However, thiscriterionfails to represent allthedecisionmakers’behaviors.
With thegrowth of thequalitativeframeworks, especiallypossibility theory, many authorshave advocated thisordinal view ofdecision making and have gaverise to qualitativedecision modelsi.e., possibilisticdecision tree [24], possibilistic influence diagrams[24],and possibilisticMarkovdecisionprocesses [42].We noticethat othernon possibilisticqualitative paradigms havebeen developed,we citeamongothers,works presentedin[10,31,34].But,inthispaperwefocusonlyon possibilistic formalisms because of the useof pessimistic and optimisticutilities that are the ordinal counter part of the expected utilitycriterionandhavereceivedanaxiomaticjustificationbyDuboisandPrade[18,20].
In the remaining ofthis Section, we propose to solvesequential collective decisionproblems under qualitative uncer-tainty. For the best of our knowledge, this work is the first attempt to solve such problems and we propose to limit ourselvestotheuseofdecisiontrees;becauseeveninthissimple,explicitformalism,thesetofpotentialstrategiesis com-binatorial (i.e.,itssize increasesexponentiallywiththesizeofthetree)and thedeterminationofanoptimalstrategy fora given problemandusingthedifferentproposeddecisionrulesisanalgorithmicissueinitself.
Decision trees proposed by Raiffa [40] in 1968 are the most popular graphical models. They encode the structure of sequential problemsbyrepresentingallpossiblescenarios.Thegraphicalcomponentofadecisiontreeisadirectedlabeled treeDT = (N ,E )whereE istheset ofedgesand N= D ∪ C ∪ LN isthesetofnodesthatcontainsthreekindsofnodes:
D theset of decisionnodes(represented bysquares); C theset of chancenodes(represented bycircles)and LN theset of leaves. Inthis formalism,the rootof thetree isgenerally a decision node,denoted by D0. Succ(N) denotes theset of
children nodes ofnode N. For any Di∈ D,Succ(Di)⊆ C i.e., a chance node(an action) mustbe chosen ateach decision
node. For any Ci∈ C, Succ(Ci)⊆ LN ∪ D:theset ofoutcomesof anactionis eitheraleaf nodeoradecision node(and
thenanewactionshouldbeexecuted).
The numerical component of decision trees consists on assigning utilityvalues to leave nodes and labeling theedges outgoingfromchancenodes.Thequantificationofadecisiontreedependsessentiallyonthenatureofuncertaintypertaining theproblemandthetheoryused torepresentit.Initsclassicalversion,decisiontreesareprobabilistic.However,whenthe available informationisordinal,Garcia etal.[24] proposetolabeltheleaves byutilitydegreesintheunitscale[0,1] and to represent the uncertainty pertaining to the possible outcomes of each Ci by a conditionalpossibilitydistribution
π
i onSucc(Ci),suchthat ∀N∈Succ(Ci),
π
i(N)= 5(N|path(Ci))where path(Ci)denotesallthevalueassignmentsofchanceanddecisionnodesonthepathfromtheroot D0 toCi.
Solving adecisiontreeamounts atbuildingacompletestrategy thatselects anaction (achancenode)for eachdecision node: astrategy isamapping δ : D 7→ C ∪ {⊥}. δ(Di)= ⊥ meansthat no action has beenselected for Di (δ ispartial). To
selecttheoptimalstrategy,authorsin[24] proposetoevaluate and comparestrategiesw.r.t. thepessimistic andoptimistic utilities axiomatized in[18].Leafnodes being labeledwithutilitydegrees,the rightmostchance nodes(i.e.,chance nodes on the far right-handside) canbeseen as simplepossibilisticlotteries. Then, eachstrategy δ canbeviewed asa connected sub-tree ofthedecisiontreeandisidentifiedwithapossibilisticcompoundlottery Lδ,i.e.,withapossibilitydistributionover
asetof(simpleorcompound)lotteries.Anycompoundlotteryisdenotedbyhλ1/L1,. . . ,λm/Lmianditcanbereducedinto
anequivalentsimplelottery1 asfollows [18]:
Reduction
(hλ
1/
L1, ..., λ
m/
Lmi) = h
max k=1,m(
min(λ
k 1, λ
k))/
x1, . . . ,
max k=1,m(
min(λ
k n, λ
k))/
xni,
where λk isthepossibility ofgettinglottery Lk accordingtoL and λkj =
π
Lk(xj) istheconditionalpossibility ofgetting xjfrom Lk.Hence,thepessimisticand optimisticutilityofastrategyδ canbecomputedon thebasisofthereductionof Lδ:
theutilityofastrategyδ isthentheoneof Reduction(Lδ).
To definecollective qualitativedecision trees,we resumethe samegraphicaland numericalcomponent of possibilistic decisiontrees[24] exceptforleavenodesthatareevaluatedaccordingtoseveralagentsinsteadofasingleone.
Each leafnode LN is now labeledby avectoruE(LN)= hu1(LN),. . . ,up(LN)i ratherthan bya singleutilitydegree (see
Fig.2).Astrategystillleads toacompoundlottery,and canbereduced, thusleadinginturn toasimple(butmulti-agent) lottery. We can now compare strategies according to any of the collective decision rules O previously presented (U−postmin,
U+postmin, U−postmax, Upost+max, Uante−min, Uante+min, U−antemax, U+antemax, Uminante and Umaxante) bycomparing their reductions. Formally, given two strategiesδ1and δ2,andacollectivedecisionrule O :
δ
1º
Oδ
2iff UO(δ
1) ≥
UO(δ
2),
where∀δ,
UO(δ) =
UO(
Reduction(
Lδ)).
(17)Example3. Consider the tree of Fig.2, involving two equally important agents and thestrategy δ(D0)=C1, δ(D1)=C3,
δ(D2)=C5.Itholdsthat:
Lδ
= h
1/
LC3,
0.
9/
LC5i
with LC3= h
0.
5/
xa,
1/
xbi,
LC5= h
0.
2/
xa,
1/
xbi.
Thereduction ofLδ canbecomputed:
Fig. 2. Collective possibilistic decision tree of Example3.
Reduction
(
Lδ) = h
max(
0.
5,
0.
2)/
xa,
max(
1,
0.
9)/
xbi = h
0.
5/
xa,
1/
xbi.
So, ifweconsiderforinstancetheU+antemincriterion,weget:
Uante+min
(δ) =
min(
max min(
0.
5,
0.
3),
min(
1,
0.
6),
max(
min(
0.
5,
0.
8)
min(
1,
0.
4))) =
0.
5.
The definition proposed byEq (17) is intuitive but raises an algorithmic challenge: theset ofstrategies to compare is exponentialw.r.t. thesizeofthetreewhichmakestheexplicitevaluationofstrategiesnotrealistic.Thesequelofthepaper providesanalgorithmicstudyoftheproblem- applyingvariants ofDynamicProgrammingwhenitispossible.
4.2. Optimizationincollectivepossibilistictrees
4.2.1. DynamicProgrammingasatoolfor ex-postutilities
DynamicProgramming [2] is anefficientprocedureofstrategyoptimization.Itproceedsbybackwardinduction wherethe problem ishandledfromtheend(inourcase,fromtheleafs);thelastdecision nodesareconsideredfirst,and recursively until reachingtheroot.Morespecifically,thealgorithmcanbedescribedasfollows:whenachancenodeCi isreached,an
optimalsub-strategyisbuiltforeachofitschildren.Thesesub-strategiesarecombinedw.r.t.theiruncertaintydegrees.Then, the resulting compound strategy is reducedto anequivalent simple lottery representing thecurrent optimalsub-strategy. WhenadecisionnodeDi isreached,weselectadecision D∗amongallthepossiblealternatives N∈Succ(D)leadingtoan
optimalsub-strategyw.r.t.ºO.Thechoiceisperformedbycomparingthesimple lotteriesequivalenttoeachsub-strategy.
This algorithmis soundand complete assoonas ºO iscomplete, transitiveand satisfies theprinciple ofweak
mono-tonicity,2that ensuresthateachsubstrategy ofanoptimalstrategyisoptimalinitssub-tree.Formally,adecisionrule O is
weaklymonotoniciffwhatever L,L′ and L′′,whatever(
α
,β)suchthat max(α
,β)=1:L
º
O L′⇒ h
α
/
L, β/
L′′i º
Oh
α
/
L′, β/
L′′i.
Eachoftheex-post criteria satisfiestransitivity,completeness andweak monotonicity,becausecollapsingto either clas-sicalpessimistic(U−)oroptimistic(U+)utility,whichsatisfiesthese properties[8,24].
TheadaptationofDynamicProgrammingtotheex-post decisionrulesisdetailedinAlgorithm1.Thisalgorithmcomputes thecollectiveutilityrelativeto eachpossibleconsequence byaggregatingtheutilityvaluesofeachleaf,and thenbuildsan optimal strategy from the last decision nodes to the root of the tree using the principle defined in [24,43] for classical (mono-agent) possibilisticdecisiontrees.
4.2.2. DynamicProgrammingfor ex-anteutilities
Whenthedecisionrulefollowstheex-ante approach,theapplicationofDynamicProgrammingisalittlemoretricky.The
ex-ante DynamicProgrammingwepropose(seeAlgorithm2)keepsateachnodeavectorofp pessimistic(resp.optimistic) utilities, one for each agent. The computation of the ex-ante utility can then be performed each time a decision is to be made. Recall that U−antemin=U−minpost and Uante+max=U+postmax. Hence, for these two criteria the optimization could also be performedusingtheex-post algorithm.
2 Itisacommonknowledgein sequentialdecisionmakingthat monotonicityisa necessaryandasufficientconditionfortheoptimalityofDynamic
Algorithm 1: DynProgPost:Ex-post DynamicProgramming.
Data: T :adecisiontree,N:anodeofT
Result: u∗:thevalueoftheoptimalstrategyδ∗-δ∗isstoredasaglobalvariable
1 begin
2 u∗=0;// Initialization
3 if N∈ LN then// Leaf: CDM aggregation
4 for i∈ {1,. . . ,p}do uN← (uN⊕ (ui⊗ωi));
5 // ⊗ =min, ωi=wi, ⊕ =max for disjunctive aggregation 6 // ⊗ =max, ωi=1−wi, ⊕ = min for conjunctive aggregation; 7 if N∈ Cthen// Chance Node: computes the qualitative utility
8 foreach Y∈Succ(N)do uN← (uN⊕ (λY)⊗D yn P rog P ost(T,Y)); 9 // ⊗ =min, λY=π(Y), ⊕ =max for optimistic utility 10 // ⊗ =max, λY=1−π(Y), ⊕ =min for pessimistic utility 11 if N∈ Dthen// Decision node: determines the best decision
12 foreach Y∈Succ(N)do
13 uY←D yn P rog P ost(T,Y);
14 if uY≥u∗thenδ(N)←Y andu∗←uY;
15 return u∗;
Algorithm 2: DynProgAnte:Ex-ante DynamicProgramming.
Data: T :adecisiontree,N:anodeofT
Result: u∗:thevalueoftheoptimalstrategyδ∗-δ∗isstoredasaglobalvariable
1 begin
2 u∗=0;// Initialization
3 if N∈ LN then// Leaf
4 for i∈ {1,. . . ,p}douEN[i]←ui;
5 if N∈ Cthen// Chance Node: computes the utility vectors
6 for i∈ {1,. . . ,p}do uEN[i]←ǫ; 7 foreach Y∈Succ(N)do
8 uEY←D yn P rog Ante(T,Y);
9 for i∈ {1, . . . ,p}do uEN[i] ← ( EuN[i]⊕ (λY⊗ EuY[i]));
10 // Optimistic utility ⊗=min, λY=π(Y), ⊕=max, ǫ←0 11 // Pessimistic utility ⊗ =max, λY=1−π(Y), ⊕ =min, ǫ←1 12 if N∈ Dthen// Decision node
13 foreach Y∈Succ(N)do
14 vY←ǫ;uEY←D yn P rog Ante(T,Y);
15 for i∈ {1,. . . ,p}do vY←vY⊕ ( EuY[i]⊗ ωi); 16 if vY>u∗thenδ(N) ←Y ,uEN← EuY and u∗←vY;
17 // Disjunctive CDM: let ⊗ =min, ωi=wi, ⊕ =max, ǫ←0 18 // Conjunctive CDM: let ⊗ =max, ωi=1−wi, ⊕ = min, ǫ←1 19 return u∗;
As shown by Counter Example 1, the Uante−max and Uante+min decision rules do not satisfy the monotonicity principle [4]. Hence,Algorithm2mayprovideagoodstrategybutwithoutanyguaranteeofoptimality.Itcanneverthelessbeconsidered asanapproximationalgorithmwhenusedforoptimizing anyofthese problematiccriteria
Counter-Example1. Considertheset ofconsequences X= {x1,x2,x3} andconsider two equallyimportantagents 1 and2
(w1=w2=1)with:u1(x1)=1,u1(x2)=0.8,u1(x3)=0.5; u2(x1)=0.6,u2(x2)=0.8,u2(x3)=0.8.
Considerthelotteries L1= h1/x1,0/x2,0/x3i,L2= h0/x1,1/x2,0/x3iand L3= h0/x1,0/x2,1/x3i:
L1gives consequencex1forsure, L2gives consequencex2forsureand L3 givesconsequencex3 forsure.Itholdsthat:
Uante−max
(
L1)
=
max i=1,2U−
i
(
L1)
=
max(
1,
0.
6)
=
1.
Uante−max
(
L2)
=
max i=1,2U−
i
(
L2)
=
max(
0.
8,
0.
8)
=
0.
8.
Hence L1≻L2withrespecttotheU−antemaxrule.
Consider now the compound lotteries L= h1/L1,1/L3i and L′= h1/L2,1/L3i. If the weak monotonicityprinciple were
satisfied, wewouldget:Uante−max(L)>Uante−max(L′).
Fig. 3. Lotteries of Counter-Example2.
Reduction(h1/L1,1/L3i)= h1/x1,0/x2,1/x3iand
Reduction(h1/L2,1/L3i)= h0/x1,1/x2,1/x3i.Itholdsthat:
Uante−max(L)=Uante−max(Reduction(h1/L1,1/L3i))=0.6.
Uante−max(L′)=Uante−max(Reduction(h1/L2,1/L3i))=0.8.
Uante−max(L) <Uante−max(L′)while Uante−max(L1) >Uante−max(L2). So, Uante−max is not monotonic.
Using thefact that Uante+min=1−Uanteτ−max [4],thiscounter-example is modifiedto show that Uante+min doesnot satisfythe monotonicityprincipleeither.
Considertwoequallyimportantagents, 1 and2 withw1=w2=1 andutilities
uτ1
(
x1) =
0,
uτ1(
x2) =
0.
2,
uτ1(
x3) =
0.
5;
uτ2(
x1) =
0.
4,
uτ2(
x2) =
0.
2,
uτ2(
x3) =
0.
2.
Considernowthesamelotteries L1, L2 and L3presentedabove.Itholdsthat:
Uante+min
(
L1) =
min i=1,2U + i(
L1) =
0<
U +min ante(
L2) =
imin=1 ,2U + i(
L2) =
0.
2, whileUante+min
(
Reduction(h
1/
L1,
1/
L3i)) =
0.
4>
Uante+min(
Reduction(h
1/
L2,
1/
L3i)) =
0.
2,whichcontradictstheweakmonotonicity.
Likewise, the ex-post Dynamic Programming Algorithm (Algorithm 1) shall also be considered as an algorithm of ap-proximation forUante−max and Uante+min sincetheyarecorrelated totheirex-post counter-partsasshownin[6].Indeed,itholds that:
Proposition2.
Uante+min
(
L) ≥
U+postmin(
L).
Uante−max
(
L) ≤
U−postmax(
L).
So, even if itis notalways thecase,it often happensthat U−maxpost =Uante−max (resp. U+postmin=Uante+min); inthese casesthe solutionprovidedbytheex-post Algorithmisoptimal.
Algorithm 2 applies alsofor theoptimization ofthe ex-ante generalization decision rules, namely the Umaxante and Uantemin
utilitiesfor heterogeneousagents.However, obtainingtheoptimalstrategy usingDynamicProgrammingisguaranteedonly formonotoniccriteria.Counter-Example2showsthat Umaxante aswellasUantemin donotsatisfythisproperty.
Counter-Example2. Consider two agents 1 and 2 having the sameimportance (w1 = w2 = 1), 1 being optimistic and 2
beingpessimistic,andconsiderthethreelotteriesonX= {x1,x2}depictedinFig.3.LetL andL′ betwocompoundlotteries
defined by:L= h0.6/L1,1/L3i and L′= h0.6/L2,1/L3i.
Wecanverifythat L1 isgloballypreferredto L2where:
Uantemax
(
L1) =
0.
9>
Uantemax(
L2) =
0.
8 whereas Uantemax(
L) =
0.
6<
Uantemax(
L′) =
0.
8,
whichprovesthat Uantemaxisnotmonotonic.
Since Uminante=1−Uanteτmax (Proposition 1), this counter-example can be modified to show that Uante+min does not satisfy the monotonicity principleeither. We consider the agents 1 and 2 with thesame importance (w1 = w2 = 1) where 1 is
pessimistic and 2 is optimistic and we replace utilityfunctions relative to x1 and x2 for lotteries L1, L2, L3, L and L′ as
follows: uτ1(x1)=0.1, uτ2(x1)=0.4, uτ1(x2)=0.2, uτ2(x2)=0.2, u1τ(x3)=0.5 and uτ2(x3)=0.2. We can check that L2 is
better than L1 since Uantemin(L1) = 0.1 < Uminante(L2) = 0.2while Uantemin(L) = 0.4> Uantemin(L′)=0.2,which provesthat Uminante does
notsatisfymonotonicityproperty.
4.2.3. RightoptimizationofUante−maxbyMulti-DynamicProgramming
ThelackofmonotonicityofUante−max isnotdramatic,even whenoptimalitymustbeguaranteed.Indeed,withUante−max we look for a strategy that has a good pessimistic utility Ui− for atleast one agent i.This means that if it ispossible to get
Algorithm 3: MultiDynProg: rightoptimizationofUante−max.
Data: T :adecisiontree
Result: u∗:thevalueoftheoptimalstrategyδ∗-δ∗isstoredasaglobalvalue
1 begin
2 u =0;// Initialization
3 u∗=0;// Initialization
4 for i∈ {1,. . . ,p}do
5 δi= ∅;// Initialization
6 δi←P esD yn P rog(T,i) // Call to classical poss.Dyn.Prog. [24] - returns an optimal strategy and its value
U−i(δi);
7 u←min(U−i (δi), ωi); 8 if u>u∗thenδ∗← δi;u∗←u; 9 return u∗;
Fig. 4. Lotteries of Counter-Example3.
for eachi astrategythat optimizesUi− (andthiscanbedonebytheclassicalDynamicProgramming,sincethepessimistic utility ismonotonic),the onewith thehighest value for Uante−max isgloballyoptimal. Formally, Uante−max canbe expressedas follows:
U−antemax
(
L) =
maxi=1,pmin
(
wi,
U−
i
(
L))
(18)where Ui−(L)isthepessimisticutilityofL accordingtoagenti.
Corollary1.LetLbethesetofpossibilisticlotteriesthatcanbebuiltonX,L beanypossibilisticlotteryandlet: -L∗⊂ Ls.t.L∗= {L∗1,. . . ,L∗p}and∀L∈ L,Ui−(L∗i)≥U−i (L); - L∗∈ L∗,s.t.∀L∗ i ∈ L∗:imax=1,p min(wi,U − i (L∗))≥maxi=1,pmin(wi,U − i (L∗i)).
ItholdsthatUante−max(L∗)≥Uante−max(L),∀L∈ L.
It follows that the optimization problem can be solved by a series of p calls to a classical (mono-agent) pessimistic optimization. This is the principle of the Multi-Dynamic Programming approach detailed by Algorithm 3. To reduce the execution time, this algorithm could be improved byconsidering only agents having importance weight wi greater than
U−antemax oftheactualstrategy.
4.2.4. RightoptimizationofUante+min:aBranchandBoundalgorithm
As previously said, U+antemin utility does not satisfy monotonicity. So, ex-ante Dynamic Programming (Algorithm 2) can provideagoodstrategy,butwithoutanyguaranteeofoptimality.Besides,thiscriterionperformstheegalitarianaggregation (use theminoperator). Thenunfortunately,asshowninCounter-Example3itisnot possibletoprovidearesultsimilarto Corollary1tocircumventthelackofmonotonicity(as forUante−max).
Counter-Example3.Considertwooptimisticagents1 and2 havingthesameimportancedegree(w1=w2=1),andconsider
thetwo lotteries L and L′ on X= {x1,x2} depictedin Fig.4. L givesconsequence x1 forsureand L′ gives consequence x2
forsure.Itholdsthat:
U+antemin
(
L) =
min(
max((
1−
1),
max(
min(
1,
0.
7),
min(
0,
0.
4))) ,
max
((
1−
1),
max(
min(
1,
0.
3),
min(
0,
0.
9)))) =
0.
3.
U+antemin
(
L′) =
min(
max((
1−
1),
max(
min(
0,
0,
7),
min(
1,
0.
4))) ,
max
((
1−
1),
max(
min(
0,
0.
3),
min(
1,
0.
9)))) =
0.
4.
Hence, L′ ≻ L withrespecttotheUante+min rule.Then, L′ istheoptimallottery L∗. Now,welookforthestrategy thatoptimizesU+i foreachagent i.Weget:
U+1
(
L) =
0.
7≻
U1+(
L′) =
0.
4.
So, L∗1=
L.
Algorithm 4: B&B algorithmfortheoptimizationofUante+min.
Data: T :adecisiontree,δ:a(partial)strategy,u:anupperBoundofU+antemin(δ)
Result: u∗:theU+anteminvalueoftheoptimalstrategyδ∗foundsofar
1 begin
2 ifδ(D0)= ⊥then Dpend← {D0};
3 else Dpend← {Di∈ Ds.t.∃Dj, δ(Dj)6= ⊥and Di∈Succ(δ(Dj))}; 4 ifDpend= ∅then// δ is a complete strategy
5 δ∗← δ;u∗←u;
6 else
7 Dnext←arg minDi∈Dpend i ;
8 foreach Ci∈Succ(Dnext)do 9 δ(Dnext)←Ci; 10 u←U pper Bound(T,δ);
11 if u>u∗then u∗←B&B(u,δ);
12 return u∗;
We can check that max
i∈{1,2} min (wi,Ui(L ∗
i))=0.7≻i∈{1,2}max min (wi,Ui(L
∗))=0.4, which proves that a result similar to
Corollary1doesnotholdfor U+antemin.
Toguaranteetheoptimality,wehavetoproposeanexactalgorithmandproceedbyanimplicitenumerationviaaBranch and Bound(B&B)algorithm,asdoneforRankDependentUtility[29] andforPossibilistic Choquetintegrals[8] (bothinthe monoagentcase).
TheBranchand Boundprocedure described byAlgorithm4takes asargumentapartialstrategy δ andanupperbound ofthe U+antemin valueofitsbestextension. Itreturnsu∗ theUante+min valueofthebeststrategy δ∗ foundso far.Toreducethe researchtime,wecaninitializeδ∗withany strategy,e.g.theoneprovidedbyDynamicProgramming(using Algorithm2or even Algorithm 1proposedfor theex-post approach).Ateachstep oftheBranchand Boundalgorithm, thecurrentpartial strategy δ is developed bythe choice of an action for some unassigned decision node. When several decision nodes are candidate,theonewiththeminimalrank(i.e.,theformeroneaccordingtothetemporalorder)isdeveloped.Therecursive procedurebacktrackswheneitherthecurrentstrategyiscomplete(thenδ∗ andu∗areupdated)orprovestobeworsethan thecurrentδ∗.
Function UpperBound(T,δ) outlined byAlgorithm 5 providesan upper bound of thebest completion ofδ: In practice, it builds for each agent i, a strategy δi that maximizes Ui+ (using [41,43]’s algorithm, which is linear). It then selects,
among these strategies, the one with the highest Uante+min. Notice that U pper Bound(T,δ)=Uante+min(δ) when δ is complete. Whenever the value returned by UpperBound(T,δ) is lower or equal to u∗, the value of the best current strategy, the algorithmbacktracksyieldingthechoiceofanotheractionforthelast considereddecisionnode.
4.2.5. Agentswithdifferentattitudes:rightoptimizationofUmaxanteandUminante
Letusfinallystudy theheterogeneousutilitiesUmaxante and Uminante proposedinSection3.2tosolvecollectivedecision prob-lems wheretheset ofdecision makersgatherspessimisticand optimisticagents. Thesedecisionrules areageneralization ofex-ante utilitiessoitisnotsurprisingthat theydonotsatisfytheweakmonotonicity(seeCounterExample2).
4.2.6. Optimizationforheterogeneous agents- themax-basedrule
Whenoptimizing Uantemax, wearelookingfor astrategythat maximizesthequalitativeutility(U−i orU+i )foratleast one agent i: wegetforeachagent anoptimalstrategywith regardsto U⊗ that isdefined accordingtohis/herattitudew.r.t. uncertainty i.e., Ui⊗ = U−i (resp.U⊗i = Ui+) if theagent is pessimistic (resp.optimistic).Then, we selectthestrategy that maximizestheglobalutilityUantemax.Basically,theoptimizationofthesecriteria reliesonthesameideathantheoneproposed fortheoptimizationofUante−max.Formally,wecanwrite:
Uantemax
(
L) =
maxi=1,pmin
(
wi,
U⊗
i
(
L));
(19)where Ui⊗(L) denotes either the pessimistic orthe optimistic utilityof L for agent i,his / her attitude w.r.t. uncertainty being capturedby⊗.
Corollary2.LetLbethesetofpossibilisticlotteriesthatcanbebuiltonX,L beanypossibilisticlotteryandlet: -L∗⊂ Ls.t.L∗= {L∗ 1,. . . ,L∗p}and∀L∈ L,Ui⊗(L ∗ i)≥U ⊗ i (L); - L∗∈ L∗,s.t.∀L∗ i ∈ L∗:imax=1,pmin(wi,U ⊗ i (L∗))≥imax=1,pmin(wi, U ⊗ i (L∗i)).
Algorithm 5: UpperBound ofB&Balgorithmfortheoptimization ofU+antemin.
Data: T :adecisiontree,δ:a(partialorcomplete)strategy,N:anodeofT
Result: u(δ):theU+minante valueofthecurrentstrategyδ
1 begin 2 uN=0;// Initialization 3 u(δ)= 0;// Initialization 4 for i∈ {1,. . . ,p}do 5 if N∈ LN then// Leaf 6 uN←ui;
7 if N∈ Cthen// Chance Node: computes the optimistic utility
8 foreach Y∈Succ(N)do
9 uY←D yn P rog Ante(T,Y);// Call to Ex-ante Dyn.Prog. (Algorithm 2) 10 uN←max(uN,min(πY,uY));
11 if N∈ Dthen// Decision node
12 ifδ(N)6= ⊥then// Prefixed action
13 δi(N)← δ(N)anduN←uY 14 else 15 foreach Y∈Succ(N)do 16 uY←D yn P rog Ante(T,Y); 17 if uY>uNthen 18 δi(N)←Y anduN←uY; 19 for j∈ {1,. . . ,p}do 20 u =0;// Initialization
21 U+j(δi)←O ptU til(δi,j);// Computes for each agent j the value of its optimistic utility U+j(δi); 22 u←min(u,max(U+j(δi),1−ωi));
23 if u>u(δ)then u(δ)←u;
24 return u(δ);
Algorithm 6: ImproHetMultiDynProg: rightoptimizationofUantemax.
Data: T :adecisiontree;D0:rootofT
Result: u∗:thevalueoftheoptimalstrategyδ∗-δ∗isstoredasaglobalvariable
1 begin
2 u∗=0;// Initialization
3 δ =D yn P rog P ost(T,D0,O pt);// O pt is the subset of optimistic agents - returns an optimal strategy δ∗ for Uante+max and its optimal value u∗ 4 foreach i∈/O pt do
5 if wi>u∗then
6 δi=P esD yn P rog(T,i); // Call to classical poss. Dyn. Prog. [24] - returns an optimal strategy for the pessimistic utility Ui−;
7 ui=min(wi,U−i(δi));
8 if ui>u∗thenδ∗← δi;u∗←ui; 9 return u∗;
ThisresultallowstheuseofMulti-DynamicProgrammingfortheoptimizationof Umaxante.Afirstidea consistsonadirect adaptationoftheMultiDynProg(Algorithm3)byreplacingline6and 7respectivelyby:
line6’:δi←D yn P rog(T,i); // Call to poss. Dyn. Prog. w.r.t. the agent attitude (Pes or Opt) [24].
line7’:u←min(Ui⊗(δi),wi); // U⊗i =U
−
i if i is pessimistic and
U⊗i =Ui+ if i is optimistic.
This algorithm (so called HetMultiDynProg) can be improved by considering first the optimistic decision makers and then thepessimistic onesinstead of p calls to DynamicProgramming foreach one ofthem.This gives rise tothe second versionofMulti-DynamicProgramming(socalledImproHetMultiDynProg)outlinedbyAlgorithm6.Inshort,thisprocedure can be described as follows: For optimistic agents the optimization of Umaxante comes down to the optimization of Uante+max
usingDynamicProgramming(eitherAlgorithm2orAlgorithm1,sinceUante+max =U+postmax).Then,weconsideronlypessimistic agents withan importancedegree wi higherthan thecurrent optimalvalue (obtainedfor optimisticagents),we compute
theirpessimisticutilitiesandselectthestrategythatmaximizes Umaxante.Obviously,HetMultiDynProgand ImproHetMultiDyn-Prog providethesameoptimalsolutionssincetheyarebothexactalgorithms.However, ImproHetMultiDynProgneeds less iterations - thisreducestheexecutiontimeasitwillbeshownbytheexperimentalstudy.
4.2.7. Optimizationforheterogeneous agents- themin-basedrule
Let us consider the latest awkward criterion Uantemin. Since it is not monotonic, Dynamic Programming comes without guaranteeofoptimality.Thus,toobtainoptimalstrategyweadapttheBranchandBoundprocedure(Algorithm4)proposed for the optimization of Uante+min by adjusting the computing of U pper Bound(T,δ). We retain the same principle but here we compute theUantemin of thebest completion of δ:for eachagent, U pper Bound(T,δ) buildsa strategy δi that maximizes
U⊗i taking into account the agent’s attitude w.r.t. uncertainty (⊗=min if the agent is pessimistic and ⊗=max if he is optimistic).Then,itselectsamongthesestrategiestheonewiththehighest Uminante.Morespecifically,weextendAlgorithm5
todealwithheterogeneous decisionmakersratherthanonlyoptimisticones.InsteadofcomputingUi+theoptimisticutility for allagentswecompute U⊗i foreachoneofthemdependingonthedecisionmakerattitude(U⊗i =U+i (resp.Ui−)ifthe agent isoptimistic(resp.pessimistic)).Themodificationsconcernlines9and21 oftheUpperBoundfunction(Algorithm 5) that arerespectivelyreplacedby:
line 9’: uN ← (uN ⊗ (λY ⊕uY)); // if i is Optimistic: ⊗=max, λY =π(Y), ⊕=min, if i is Pessimistic: ⊗=min,
λY=1−π(Y), ⊕=max.
line 21’: Uj(δi) ← U til(δi,j); // Computes for each agent j the value of its optimistic or pessimistic utility w.r.t. his/her attitude.
5. Experiments
This last Sectionaims atexperimentingthe feasibility oftheexact algorithmsproposed, namely(i)Dynamic Program-ming forU+postmax and U−minpost , andalso forUante+max and U−antemin becausethelatterscoincide withformers, (ii)Multi-Dynamic Programming forUante−max (purepessimisticagents)andUantemax (heterogeneousagents),and (iii)BranchandBoundfor U+antemin
(pureoptimisticagents)and Uantemin (heterogeneousagents).
Beyondaproofof feasibilityofthese algorithms,ourexperiments aimatevaluatingto whatextent theoptimization of theproblematic(nonmonotonic)utilities,canbeapproximatedbyDynamicProgramming.Forhomogeneousagents,ex-post
and ex-ante DynamicProgramming algorithms canindeed beused but come withoutguarantees of optimality- they can beconsideredasapproximationalgorithms.However,forheterogeneousagents,thepostapproachismeaninglessand only
ex-ante DynamicProgrammingshallbeconsideredforapproximationpurposes.
The implementation has been done in Java, on a processor Intel Core i7 2670 QMCPU, 2.2Ghz, 6Gb of RAM. The ex-periments were performed on complete binary decision trees. We have considered five sets of problems, the number of decisions to be made in sequence (denoted seq) varying from 2 to 6, with analternation of decision and chance nodes: ateachdecisionlevel l (i.e.,odd level),thetreecontains2l−1 decisionnodesfollowed by2l chancenodes.3In thepresent
experiments, thenumberofagentsissetequalto 6 (forheterogeneousagents cases,weset 3optimisticand3pessimistic agents). Theutilityvalues aswellastheweights degreesareuniformlyfired intheset {0,0.1,0.2,. . . ,0.9,1}.Conditional possibilities arechosen randomly in [0,1] and normalized. Each ofthe fivesamples ofproblems contains 1000 randomly generatedtrees.
5.1. Feasibilityanalysisandtemporalperformances
Table1presents,foreachcriterion,theexecutiontimeofeachpossiblealgorithm.Obviously,whateverthealgorithmthe CPU timeincreases withthe sizeof thetree. DynamicProgramming is alwaysbelow to thethreshold of1 ms, whilethe BranchandBoundalgorithmsaremoreexpensive(upto16 ms)but itremainsaffordableevenforbigtrees(1365decision nodes).
Fortricky(nonmonotonic)decisionrules,boththeexactalgorithm(s)andtheapproximationalgorithm(s)arepresented. It can be checked that for these rules the approximation Dynamic Programming is always faster than exact algorithms. Unsurprisingly, and whatever the rule tested, the ex-ante Dynamic Programming is slightly slower than the ex-post
Dy-namic Programming - both remaining far below the millisecond,in any case.Finally, asto the optimization of Uantemax, the experimental resultsverifythat ImproHetMultiDynProgisquickerthanHetMultiDynProg- bothbeing exactalgorithms.
Furthermore,tostudytheeffectsofvaryingthenumberofagents,weconsidertheoptimizationofUante−maxand Umaxante,for reasonable trees(341decisionnodes)with pagents from3to 10,usingthemoretime-consuming algorithm (Branchand Bound). Clearly,asshownin Table2, theaverageCPUtimewith 3 agents, isabout 3 millisecondsfor Uante−max and about 4 millisecondsfor Uantemax.ThemaximalCPUtimefor decisiontreeswith10 agentsisless than 11 millisecondsinbothcases. Thus,wecansaythat theresultsaregoodenoughtoallowthehandlingofreal-size problems.
5.2. QualityofapproximationofexactalgorithmsbyDynamicProgramming
As previously said, U−antemax and Uante+min, relative to homogeneousagents, and Umaxante and Uminante, for heterogeneous ones, are not monotonic. For both cases, right optimization is performed using Multi-Dynamic Programming for max-oriented
Table 1
AverageCPUtime,inmilliseconds,accordingtothesizeofthetree(innumberofdecisionnodes).
Algorithm # of decision nodes
5 21 85 341 1365
U−postminUante−min Post Dyn. Prog. 0.022 0.026 0.038 0.052 0.106 U+postmaxUante+max Post Dyn. Prog. 0.024 0.030 0.043 0.060 0.117 U−postmax Post Dyn. Prog. 0.025 0.027 0.039 0.053 0.112 U+postmin Post Dyn. Prog. 0.026 0.028 0.041 0.059 0.110
U−antemax Multi Dyn. Prog. 0.063 0.074 0.102 0.129 0.605 U−antemax Ante Dyn. Prog. 0.049 0.065 0.93 0.102 0.446
U+antemin Branch & Bound 0.359 0.794 2.044 6.095 14.198 U+antemin Ante Dyn. Prog. 0.032 0.063 0.090 0.114 0.534
Umax
ante Het. Multi. Dyn. Prog. 0.068 0.073 0.114 0.136 0.319
Umaxante Impro. Het. Multi. Dyn. Prog. 0.047 0.058 0.079 0.124 0.187 Umaxante Ante Dyn. Prog. 0.053 0.065 0.096 0.149 0.217
Uminante Het. Branch & Bound 0.420 0.972 2.708 8.483 16.356 Umin
ante Ante Dyn. Prog. 0.051 0.071 0.109 0.131 0.206
Table 2
AverageCPUtime (inmilliseconds)forUante−max andUmax
ante usingBranch andBound algo-rithms(B&BandHet.B&B)fortreeswith341 decisionnodes.
# of agents 3 4 5 6 7 8 9 10 U−antemax 3.596 4.394 5.344 6.056 7.137 7.840 8.534 9.204 Umax ante 4.520 5.444 7.023 7.824 8.785 9.521 10.457 10.987 Table 3
QualityofapproximationofexactalgorithmsMultiDyn.Prog.(for U−antemax)andB&B (for U+antemin)byax-ante andax-post Dyn.Prog.
Algorithm # of decision nodes
5 21 85 341 1365
% of success
U−antemax Ante Dyn. Prog 16.1% 19.8% 23.7% 27.1% 31.9% U−antemax Post. Dyn. Prog 17% 24.2% 28.9% 33.7% 39% U+antemin Ante Dyn. Prog. 82% 78.6% 71% 65.4% 60.2% U+antemin Post Dyn. Prog. 93.2% 91% 89.3% 87.5% 84.7%
Closeness Value
U−antemax Ante Dyn. Prog. 0.49 0.54 0.62 0.71 0.80 U−antemax Post Dyn. Prog. 0.47 0.51 0.59 0.69 0.73 U+antemin Ante Dyn. Prog. 0.96 0.94 0.92 0.91 0.90 U+antemin Post Dyn. Prog. 0.97 0.96 0.95 0.94 0.93
aggregationutilitiesand Branchand Boundformin-oriented ones.Forthesecriteria, DynamicProgrammingalgorithmscan nevertheless beconsideredasapproximationalgorithms. Thefollowingexperiments estimatethequality ofthese approxi-mations. Tothisextent, we computefor eachsample thesuccess rateoftheconsidered approximationalgorithm, i.e.,the number of treesfor which the value provided bythe approximationalgorithm is actually optimal (i.e.,equals to theone computed bytheexact algorithm);thenfor thetreesfor which theapproximationalgorithm failsto reachoptimality, we report the average closeness value to UApprox
UExact where UApprox is the utility of the strategy provided by the approximation
algorithm and UExact is the optimal utility- the one of the solution by the exact algorithm: Namely, Branch and Bound
algorithmfor Uante+min and itsadaptationforthegeneralizedcriterionUantemin and Multi-DynamicProgrammingforUante−max and itsgeneralizationforUantemax.Theresults aregiven inTables 3and 4.