Belief Merging Operators as Maximum Likelihood Estimators (extended version including the proofs of propositions)
Patricia Everaere
1, Sébastien Konieczny
2, Pierre Marquis
2,31
CRIStAL – CNRS, Université de Lille, France
2
CRIL – CNRS, Université d’Artois, France
3
Institut Universitaire de France, France
[email protected], [email protected], [email protected]
Abstract
We study how belief merging operators can be con- sidered as maximum likelihood estimators, i.e., we assume that there exists a (unknown) true state of the world and that each agent participating in the merging process receives a noisy signal of it, char- acterized by a noise model. The objective is then to aggregate the agents’ belief bases to make the best possible guess about the true state of the world. In this paper, some logical connections between the rationality postulates for belief merging (IC postu- lates) and simple conditions over the noise model under consideration are exhibited. These results provide a new justification for IC merging pos- tulates. We also provide results for two specific natural noise models: the world swap noise and the atom swap noise, by identifying distance-based merging operators that are maximum likelihood es- timators for these two noise models.
1 Introduction
The aggregation of pieces of information, encoding prefer- ences, beliefs or judgments, that are owned by a number of agents, is a key challenge in many settings, among social choice, vote, merging, or judgment aggregation. Whatever the setting, the design of aggregation operators is typically guided either by axiomatic concerns, or by epistemic con- cerns. In the former case, expected properties (postulates) are exhibited, and operators satisfying them are pointed out or impossibility theorems are proved. In the latter case, it is assumed that the preferences, beliefs or judgments that are reported by the agents are a noisy perception of “cor- rect” preferences, beliefs or judgments. Such an epistemic approach have received great attention when dealing with preferences [Young, 2003; Conitzer and Sandholm, 2005;
Xiaet al., 2010; Conitzer, 2014; Elkind and Slinko, 2016;
Pivato, 2012], and can be traced back at least to Condorcet, who justified the use of majoritarian vote in this way (his fa- mous jury theorem [Condorcet, 1785]). Contrastingly, the epistemic interpretation of belief merging operators gave rise to few research work (we know a single paper about it, it is discussed in the related work section).
Propositional belief merging operators aim to associate a belief base with a set of (usually conflicting) belief bases rep- resenting the individual beliefs of the agents. So far, belief merging operators have been evaluated with respect to several criteria, such as rationality properties leading to the so-called IC merging operators [Konieczny and Pino Pérez, 2002a;
Konieczny and Pino Pérez, 2011], properties inspired by vote [Haret et al., 2016], computational complexity [Konieczny et al., 2004], strategy-proofness [Everaereet al., 2007], etc.
In this work, we want to evaluate belief merging operators asmaximum likelihood estimators (MLE). One assumes that there exists a true state of the worldω?(represented by a spe- cific propositional interpretation) and that each agent partici- pating in the merging process receives a noisy signal of it (a formula Ki). The objective is then to aggregate the agents’
belief bases to make the best possible guess aboutω?. The key issue we want to address is to determine the extent to which belief merging operators are suited to this objective, i.e., to figure out how the corresponding merged bases and ω?are connected.
In the following, we provide a number of results in this di- rection. First, we present some logical connections between rationality postulates for merging (IC postulates) and sim- ple conditions (especially about independence) over the noise model P under consideration. We also rationalize a large family of distance-based IC merging operators ∆ by show- ing that for each operator∆of the family, there exists a noise model P such that∆is an MLE forP. Then we focus on two specific noise models. For the first one, the world swap noise, we show that a standard distance-based merging op- erator based on the drastic distance is an MLE. And for the second one, the atom swap noise, we show that the distance- based merging operator based on the sum and the Hamming distance is an MLE. Finally, we report some empirical results showing that those two distance-based merging operators are efficient noise cancellers in practice, meaning that the num- ber of agents needed for identifying the true state of the world with high probability remains small enough.
The rest of the paper is organized as follows. In the next section, some formal preliminaries on belief merging are pro- vided. In Section3, noise models and maximum likelihood estimators are defined in the belief merging setting. In Sec- tion4, conditions under which merging operators are / are not maximum likelihood estimators are pointed out. In Section5
the focus is laid on the world swap noise, and in Section6 on the atom swap noise. Section7gives some empirical re- sults. Section8discusses some related work. Finally, Section 9concludes the paper. The proofs of the propositions are re- ported in an appendix.
2 Formal Preliminaries
We consider a propositional languageLdefined from a finite setP ofmpropositional variables and the usual connectives.
An interpretation (or state of the world)ω is a total function fromP to{0,1}. Ωis the set of all interpretations. An in- terpretationωis a model of a formulaφ ∈ Lif and only if it makes it true in the usual truth-functional way. Otherwise, ω is a counter-model ofφ. [φ]denotes the set of models of formulaφ, i.e.,[φ] ={ω∈Ω|ω|=φ}.
Abelief baseK is a finite set of propositional formulae, interpreted conjunctively (i.e., viewed as the conjunction of its elements). AprofileE(of belief bases) is associated with a group ofnagents that are involved in the merging process;
formally,Eis given by a vector of basesE=hK1, . . . , Kni.
VE denotes the conjunction of all elements of E, and t denotes concatenation. Two profiles E = hK1, . . . , Kni and E0 = hK10, . . . , Kn0i are said to be equivalent, noted E ≡ E0, iff there exist a permutationσsuch that∀Ki, we have[Ki] = [Kσ(i)0 ]. Anintegrity constraintµis a formula re- stricting the possible results of the merging process. Amerg- ing operator4is a mapping which associates with a profile Eand an integrity constraintµa (merged) base4µ(E).
We expect belief merging operators to satisfy the following rationality postulates (the IC postulates) [Konieczny and Pino Pérez, 2002a]:
Definition 1 (IC merging operator) A merging operator4 is anIC merging operatoriff it satisfies the following proper- ties:
(IC0) 4µ(E)|=µ
(IC1) Ifµis consistent, then4µ(E)is consistent
(IC2) IfVEis consistent withµ, then4µ(E)≡VE∧µ (IC3) IfE1≡E2andµ1≡µ2, then4µ1(E1)≡ 4µ2(E2) (IC4) IfK1 |=µandK2|=µ, then4µ(hK1, K2i)∧K1is
consistent if and only if4µ(hK1, K2i)∧K2is consistent (IC5) 4µ(E1)∧ 4µ(E2)|=4µ(E1tE2)
(IC6) If4µ(E1)∧ 4µ(E2)is consistent, then4µ(E1tE2)|=4µ(E1)∧ 4µ(E2) (IC7) 4µ1(E)∧µ2|=4µ1∧µ2(E)
(IC8) If4µ1(E)∧µ2is consistent, then 4µ1∧µ2(E)|=4µ1(E)∧µ2
See [Konieczny and Pino Pérez, 2002a] for explanations and justifications of these properties.
Let us give some examples of IC merging operators from the family of distance-based merging operators [Koniecznyet al., 2004]. We first need a notion of (pseudo-)distance:
Definition 2 (pseudo-distance) A (pseudo-)distance be- tween interpretations is a mappingd : Ω×Ω → IR+such that for anyω1,ω2∈Ω:
• d(ω1, ω2) =d(ω2, ω1) (symmetry)
• d(ω1, ω2) = 0iffω1=ω2 (separation) Every distance between interpretations can be easily lifted to a “distance” between worlds and formulae, by stating that d(ω, K) =minω0∈[K]d(ω, ω0).
Usual distances considered in merging [Konieczny and Pino Pérez, 2002a] are the Hamming distance dH and the drastic distance dD. dH(ω1, ω2)is the number of proposi- tional letters on which the two interpretations differ (this cor- responds to the 1-norm distance, also referred to as the Man- hattan distance).dDis such thatdD(ω1, ω2) = 0iffω1=ω2, and= 1otherwise (this corresponds to the infinity-norm dis- tance, also known as Chebyshev distance).
One then needs a notion of aggregation function:
Definition 3 (aggregation function) An aggregation func- tionis a mapping1f from(IR+)ptoIR+, which satisfies:
• ifxi≥x0i,
thenf(x1, . . . , xi, . . . , xp)≥f(x1, . . . , x0i, . . . , xp) (non-decreasingness)
• f(x1, . . . , xp) = 0if∀i∈ {1, . . . , p}, xi= 0
(minimality)
• f(x) =x (identity)
• Ifσis a permutation over{1, . . . , p}, then
f(x1, . . . , xp) =f(xσ(1), . . . , xσ(p)) (symmetry) Many aggregation functions have been pointed out so far, and considered in the belief merging literature. They include sum(Σ),leximin (aliasGmin),leximax(aliasGmax), and Σk (the sum of thekthpowers – wherekis a fixed integer).
Formal definitions can be found, e.g., in [Konieczny and Pino Pérez, 2002a;Everaereet al., 2010b].
Definition 4 (distance-based merging operator) Letdand f be respectively a pseudo-distance between interpretations and an aggregation function. Thedistance-based merging op- erator4d,f is defined by[4d,fµ (E)] = min([µ],≤E), where the total pre-order≤EonΩis defined in the following way (withE=hK1, . . . , Kni):
• ω≤Eω0iffd(ω, E)≤d(ω0, E),
• d(ω, E) =f(d(ω, K1), . . . , d(ω, Kn)).
For many standard aggregation functions, whatever the chosen pseudo-distanced between interpretations, the cor- responding distance-based operator4d,f is an IC merging operator [Konieczny and Pino Pérez, 2002a;Everaereet al., 2010b]:
Proposition 1 Letdbe any pseudo-distance between inter- pretations and let f be an aggregation function amongΣ, leximin,leximax, orΣk.4d,fis an IC merging operator.
More generally, properties on the aggregation function en- suring that the corresponding distance-based operator is an IC merging operator can be found in [Koniecznyet al., 2004].
1Strictly speaking, it is a family of mappings, where a mapping is defined for each positive integerp.
3 Noise and Maximum Likelihood Estimators
In this work, we assume the existence of a true state of the world, that is represented by an interpretationω?∈Ω. When- ever some integrity constraints µ are available, we assume thatω?is a model ofµ, meaning that the integrity constraints are always supposed to hold in the true state of the world.
Stated otherwise, integrity constraints are pieces of knowl- edge, i.e., of true beliefs.
LetIbe a set ofnagents. We suppose that each agent has a noisy perception of the true state of the worldω?: for every i ∈ I, the baseKi representing the beliefs of agenticorre- sponds to the perception of ω? byi. The given (observed) profileE =hK1, . . . , Knithus reflects the noisy perception of the true state of the world by the group of agents.
Because of the noise, there is no guarantee that [Ki] = {ω?} for at least one agenti, or even that there exists any logicalconnection betweenEandω?(for instance, it can be the case thatω? is inconsistent with each Ki inE). How- ever, depending onω?, the observation of a profileEis more or less plausible given the noise. Formally, a noise model makes precise the connection betweenω?and the profiles us- ing probabilities.
Given the worldsω ∈ Ωand the agentsi ∈ I, one con- siders2m×nBoolean random variables of the form(ω, i).
Each random variable(ω, i)is thus associated with two pos- sible outcomes, the one noted(ω, i), where the random vari- able(ω, i)takes the truth value true, and the one noted(ω, i), where the random variable(ω, i)takes the truth value false.
The fact that (ω, i) takes the truth value true (resp. false) means that the world ω is not discarded (resp. discarded) by the observation achieved by agent i, i.e., ω is a model of Ki (resp. not a model ofKi). O denotes the Cartesian product of the set of outcomes for all the Boolean random variables(ω, i). Every element ofO represents (from a se- mantical point of view) a profileEsince it indicates for every worldω ∈ Ωand every agenti ∈ I whether or notω is a model ofKi. Similarly, every projection of an element ofO over the outcomes associated with all the random variables of the form(ω, i)for a fixedi ∈ I represents (again, up to logical equivalence) a belief baseKi. Given those notations, a noise model can be formally defined as follows:
Definition 5 (noise model) Anoise modelP is a mapping associating with every consistent formulaµ ∈ Land every interpretationω ∈ [µ]a joint probability distribution, noted Pµ,ω, of the2m×n Boolean random variables(ω, i), i.e., Pµ,ω is a probability distribution over O, such that the fol- lowing conditions are satisfied:
• ∀ω0∈Ω, ifω06|=µ, then∀i∈ I,Pµ,ω((ω0, i)) = 0,
• Ifµ0≡µ, thenPµ,ω =Pµ0,ω.
Since each Pµ,ω is a joint probability distribution, every (non-empty) subset of the2m×nBoolean random variables of the form(ω, i)corresponds to a marginal probability dis- tribution.
The first condition in Definition5states that any interpreta- tionω0that violates the given integrity constraintµwill never be reported by any agent as a model of her belief base (stated
otherwise, every agent involved in the merging process is sup- posed to be aware ofµ). Thus, the (marginal) probability of such an (impossible) event(ω0, i)for the random variable (ω0, i)is0. The second condition in the definition expresses a form of syntax-independence for the integrity constraint.
A profile Eis said to be generated by the noise modelP ifEis a random variate (outcome) fromP. In this work, we observe a single profileE generated byP, and we want to determine whether a merging operator allows to identify the worlds that best explainE, i.e., to find (if it exists) a merging operator that is a maximum likelihood operator forP: Definition 6 (maximum likelihood estimator) LetP be a noise model. A merging operator∆P is amaximum likeli- hood estimator (MLE) forP iff for any consistent formula µ ∈ L and for any profile E generated by P, we have2 [∆Pµ(E)] =argmaxωPµ,ω(E).
4 IC Postulates, IC Operators and MLEs
We first show that if a merging operator ∆ is an MLE for some noiseP(whatever this noise), then∆must comply with some IC postulates.
Proposition 2 LetP be any noise model. If∆is an MLE for P, then∆satisfies(IC0),(IC1),(IC3),(IC7)and (IC8).
In general, we can expect noise models to satisfy some ad- ditional constraints, like the following independence condi- tion, which is very natural (and is quite standard for voting methods):
Definition 7 (independence) A noise modelP satisfies the independence conditioniff for any for any consistent formula µ ∈ L, anyω ∈ [µ], for eachi ∈ I, the set of2mrandom variables{(ω, i)|ω∈Ω}considered inPµ,ωis independent from all the remaining random variables.
WhenPsatisfies the independence condition, whatever the true state of the world, the beliefs of any agent are indepen- dent from the beliefs of the other agents.
The independence condition rules out some merging oper- ators as MLEs:
Proposition 3 LetPbe any noise model satisfying the inde- pendence condition. If ∆is an MLE forP, then∆satisfies (IC5)and(IC6).
To sum up, a merging operator that is an MLE for a noise model satisfying the independence condition must satisfy al- most all IC postulates. The exceptions are(IC2)(that requires the merged base to be the conjunction of all the bases with the integrity constraints whenever this conjunction is consistent), and(IC4)(that requires an equal treatment of the two bases in any profile containing two bases).
In the literature (see for example [Conitzer and Sandholm, 2005]), another assumption about noise is often considered, namely the uniformity condition:
Definition 8 (uniformity) A noise model P satisfies the uniformity condition iff for any for any consistent formula µ ∈ L, anyω∈[µ], anyi, j∈ I, and anyω0 ∈Ω, we have Pµ,ω((ω0, i)) =Pµ,ω((ω0, j)).
2argmaxωPµ,ω(E) = {ω ∈ Ω|∀ω0 ∈ Ω, Pµ,ω(E) ≥ Pµ,ω0(E)}.
Intuitively, the uniformity condition states that the noise model is the same one for all the agents.
Interestingly, many merging operators (especially IC ones) can be interpreted as MLEs for a noise model satisfying the independence condition and the uniformity condition:
Proposition 4 Let dbe any pseudo-distance on interpreta- tions. Let k be any natural number. There exists a noise modelP satisfying the independence condition and the uni- formity condition such that the merging operators∆d,Σkand
∆d,leximax are MLEs forP.
Proposition4shows that many distance-based merging op- erators∆ (namely those for which the aggregation function is one of theΣk orleximax) are rationalizable in the sense that there exists a noise modelP such that∆is an MLE for P.
The fact that a merging operator can be rationalized as an MLE (i.e., showing that there exists a noise modelPsuch that the operator is an MLE forP) is a valuable property of the operator when one is interested in making the best possible guess about ω?. Indeed, as already discussed in [Everaere et al., 2010a], merging a profile of belief bases is a useful operation when one wants to synthesize the information given in the profile, i.e., to characterize a belief base which best represents the beliefs of the input profile (the synthesis view of belief merging), or when one wants to best approximate the true state of the world from the information give in the profile (the epistemic view of belief merging). In the second case, rationalizable merging operators must clearly be preferred to non-rationalizable ones. The rationalization property can thus be viewed as an interesting criterion for helping to choose a merging operator that is suited to the epistemic purpose.
In some situations, more information about the noise modelP are available, and an interesting issue is then to de- termine whether a merging operator that is an MLE for the specific noise model P exists. This is a research question that is somewhat converse to the one discussed in the previ- ous paragraphs: instead of starting with a merging operator∆ and looking for the existence of a noise modelPsuch that∆ is an MLE forP, one starts with a noise modelP and looks for a merging operator∆that is an MLE forP.We address this issue in the rest of the paper. To be more precise, two sensible noise models are considered and merging operators that are suited to them are identified.
5 The World Swap Noise
A plausible noise model is obtained by assuming that some interpretations ω of Ω are misperceived (and thus misclas- sified) by the agents: for an agent i, when ω = ω? (resp.
ω6=ω?), it is considered as a counter-model (resp. a model) ofKi. This noise model is parameterized by a(world) swap probabilityp, which represents the chances of an interpreta- tion to be considered as a model of the beliefs of an agent while it is not the true state of the world. We make the (rea- sonable) assumption that this probability of error is less than
1
2, otherwise the noise would be too important. More for- mally, theworld swap noise modelPwsnis a mapping asso- ciating with every consistent formulaµ ∈ L, every interpre- tationω∈[µ], the joint probability distribution, notedPµ,ωwsn,
of the2m×nBoolean random variables(ω0, i)defined as the product of the distributions of the random variables (ω0, i) whenω0varies inΩandivaries inI:
Definition 9 (world swap noise) Let pbe a real number in [0,12), the (world) swap probability.
Pµ,ωwsn((ω0, i)) =
( 0 ifω0 6|=µ
p ifω0 |=µandω06=ω, 1−p otherwise.
Stated otherwise, one considers here that every Boolean random variable(ω, i)is associated with a Bernoulli distribu- tion, so that the joint distributionPµ,ωwsnfor the2m×nrandom variables is a binomial distribution with parametersnandp.
We can easily check that the world swap noise modelPwsn satisfies the independence and uniformity conditions. It turns out that this noise is linked to a pseudo-distance between for- mulae. The distance of two formulae is defined here as the number of interpretations one have to add or to retrieve to one of the two formulae to make it equivalent to the other for- mula. We call it theswap distancedswapbetween formulae.
Definition 10 (swap distance) Let φ, φ0 be two formulae fromL.dswap(φ, φ0) =|{ω∈Ω|ω|=φ⊕φ0}|.3
Example 1 Consider the formulaeφ = a∧b,φ0 = a∨b, φ00 = ¬awhereP = {a, b}. We have: dswap(φ, φ0) = 2, dswap(φ, φ00) = 3, and dswap(φ0, φ00) = 3. According to dswap,a∧bis thus closer toa∨bthan to¬a.
Proposition 5 dswapis a pseudo-distance between formulae.
When the world swap noise modelPwsnis considered, for any modelωof the integrity constraintµand any baseKof E, the largestdswap(K, ω), the lessPµ,ωwsn(K). As a conse- quence:
Proposition 6 Letωbe any model of the integrity constraint µ, φ be any formula having ω as its single model, and let Ki, Kj be two belief bases of E. If dswap(Ki, φ) ≥ dswap(Kj, φ), thenPµ,ωwsn(Ki)≤Pµ,ωwsn(Kj).
On this ground, we can show that the drastic merging op- erator∆dD,Σis an MLE for this noise:
Proposition 7 ∆dD,Σis an MLE forPwsn.
6 The Atom Swap Noise
Another interesting noise is the atom swap noisePv, obtained by considering that the observed values of variables fromP may differ from their values inω?. In such a case, the prob- ability that an interpretation belongs to a base decreases with the Hamming distance between this interpretation and ω?. Such a noise model is suited for instance to scenarios where the observed values of the variables come from noisy sensors.
Pvis parameterized by a probabilityp <1, that indicates the probability that the truth value of any observed variable v∈ P differs from the actual one (i.e., its value inω?). For- mally theatom noise modelPvis a mapping associating with every consistent formulaµ∈ L, every interpretationω∈[µ], a joint probability distribution, notedPµ,ωv , of the 2m ×n
3⊕is the exclusive disjunction connective.
Boolean random variables(ω0, i)defined as the product of the distributions of the random variables(ω0, i)whenω0varies in Ωandivaries inI:
Definition 11 (atom swap noise) Letpbe a real number in (0,1), the (atom) swap probability.
Pµ,ωv ((ω0, i)) =
0 ifω06|=µ pdH(ω,ω0)+1 otherwise.
So, with this noise model, the true state of the worldω? appears in any belief base with a probabilityp, and any other world has a smaller probability, which exact value depends on its distance toω?.
Clearly enough, Pv satisfies the independence condi- tion (by definition) and the uniformity condition (since Pµ,ωv ((ω0, i))does not depend oni).
Let us illustrate the behavior of this noise model on an ex- ample, with three variables (m= 3) and a profile forn= 5 agents.
Example 2 Suppose that the true state of the world ω? is ω? = 001, with a swap noise Pv such that p = 30%.
For 5 agents, a possible outcome given that noise is the profile E = hK1, K2, K3, K4, K5i with [K1] = {011}, [K2] = {110,001,111,101}, [K3] = {000,001,111}, [K4] = {000,110,001}, [K5] = {001,111}. In this case, the resulting merged base with ∆dH,Σ from this profile is [∆dH,Σ(E)] ={001}.
One can show that:
Proposition 8 ∆dH,Σis an MLE forPv.
7 Experimental Results
Let us turn now to a more practical issue: determining the extent to which the distance-based merging operators iden- tified as maximum likelihood estimators for the world swap noise and for the atom swap noise are efficient noise can- cellers in practice. This requires to evaluate the number of agents needed for getting a merged base equals to the true state of the world with high probability. To this end, we per- form some experiments. We report hereafter the obtained re- sults for the world swap noise and the atom swap noise.
In our experiments, the true state of the worldω?has been generated by considering every variablevfromP in an iter- ative way, assigning v to1 with probability 12. We did not assume any integrity constraint to be fulfilled (i.e.,µis sup- posed to be a valid formula). Then, for several values of the parameterp, the noise modelPwsn (resp. Pv) has been ap- plied toω?in order to generate1000profilesEconsisting of nbases for increasing values ofn.
Each resulting profile has then been merged using∆dD,Σ for the world swap noise and using∆dH,Σfor the atom swap noise. A success has been obtained whenever[∆dD,Σ(E)] = {ω?}for the world swap noise and[∆dH,Σ(E)] ={ω?}for the atom swap noise. For each value ofn, the success rate was defined as the percentage of profiles for which a success has been obtained.
Figure1shows some of the results obtained for the world swap noise and the atom swap noise, for several values ofp
when m = 10. Though considering 10 propositional vari- ables may look not that high, one must keep in mind that2m possible worlds exist since no integrity constraint is consid- ered, so that the task is to identify a single world in a set of 1024worlds. To interpret the results correctly, one must re- mind that the noise increases with pwhen the world swap noise is considered, while it decreases withpwhen the atom swap noise is considered.
Our experimental results confirm the ability of the merg- ing operators∆dD,Σand∆dH,Σto “cancel” their respective noises. It is interesting to note that even for very noisy envi- ronments (p= 40%for the world swap noise andp= 30%
for the atom swap noise) the number of bases needed to obtain a high percentage of success (80%) remains reasonable (100 for the world swap noise and 45for the atom swap noise).
In addition, when the noise level is lower (p= 30%for the world swap noise andp= 60%for the atom swap noise), as few as 30 bases are enough for being nearly sure (>90%) to identify the true world.
One can observe that for the world swap noise, the number of agents to be considered for getting with high probability a merged base having the true state of the world as its sole model remains quite small, even when the noise level is high.
The empirical results also show that the noise cancellation task requires less effort (i.e., less agents) when dealing with the atom swap noise than when dealing with the world swap noise.
Finally, let us note that similar results have been obtained for the other values ofmconsidered in our experiments (we letmvary from2to15; the shapes of the resulting curves are close to those of the curves given in Figure1, the number of agents required for getting a success increasing withm).
8 Related Work
This work is related to the rationalization of voting rules as maximum likelihood estimators, as discussed in [Conitzer and Sandholm, 2005;Young, 1995;Xiaet al., 2010;Conitzer, 2014; Pivato, 2012]. Such an interpretation of voting rules requires to suppose that some candidates are better than oth- ers, in an absolute/objective sense, and that the agent’s pref- erences are noisy estimates of the “absolute” agents’ ranking.
The objective of voting is then to determine this best ranking from the agents’ votes.
In this paper, the focus is on belief bases, that are clearly distinct from rankings (in particular, the result one looks for is not a ranking of all interpretations but a single interpreta- tion). Thus, we have identified some connections between the MLE question and the standard logical postulates for merg- ing belief bases. Our work also departs significantly from the works reported in the references listed just above, because it is not centered on the issue of rationalizing an aggregation method by determining a noise that justifies it. Indeed, we have also been concerned by the converse research question (we defined two sensible noises and reported for each of them a corresponding MLE merging operator).
This work has also connections with the truth tracking problem [Bovens and Rabinowicz, 2006; Pigozzi and Hart- mann, 2007], that has been studied for belief merging in [Ev-
(a) World swap noise
(b) Atom swap noise
Figure 1: Success rate for several values ofp, withm= 10.
eraereet al., 2010a]. This problem consists in determining, when the agents are reliable enough, whether the true state of the worldω? can be identified in the limit (i.e., when the number of agents participating in the merging process tends to infinity) as the unique model of the corresponding merged base. In [Everaereet al., 2010a], an extension of Condorcet’s jury theorem to belief merging has been pointed out, giving a positive answer to the issue under some standard assumptions about the independence and the reliability of the agents.
The maximum likelihood estimator question differs from the truth tracking one in several aspects. In the truth tracking case,ω?is known and every agent is more or less reliable. In the maximum likelihood case,ω? is not known, agents have a noisy perception of it, and the corresponding noise model is more or less known. Furthermore, the observed profile is fixed (so the number of agents does not tend to infinity). So, while the truth tracking problem and the MLE one are related to the issue of providing an epistemic justification for belief merging operators, they offer complementary views.
9 Conclusion
In this work we carried out an evaluation of belief merg- ing operators as maximum likelihood estimators. We have shown close connections between noise models and postu- lates for IC merging. Especially, we have proved that a merg- ing operator that is an MLE for an independent and uni- form noise model has to satisfy most IC merging postulates ((IC0), (IC1), (IC3), (IC5), (IC6), (IC7), (IC8)). We have also shown that all distance-base merging operators based on theΣk aggregation functions are MLEs for some noise.
We have studied two specific, yet sensible noise models: the world swap noise and the atom swap noise. We have shown that some well-studied distance-based merging operators are MLEs for those noise models. Finally, we have reported re- sults from experiments showing that finding the true state of the world is feasible in practice for each of the two specific noises (the empirical results show that the number of agents required to identify the true state of the world remains rea- sonable). Based on those results, our work provides a new justification for IC merging operators, of epistemic nature.
This work calls for several perspectives for further re- search. One of them consists in identifying some additional pairs(P,∆)where∆is a maximum likelihood operator for the noise modelP. In application scenarios where merging beliefs is required, whenever the noise model can be guessed, a catalog of such pairs would be a useful methodological tool for deciding which belief merging operator to choose.
Acknowledgements
This work has benefited from the support of the AI Chair BE4musIA of the French National Research Agency (ANR).
References
[Bovens and Rabinowicz, 2006] Luc Bovens and Wlodek Rabinowicz. Democratic answers to complex questions.
Synthese, 150:131–153, 2006.
[Condorcet, 1785] Marquis de Condorcet. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris, 1785.
[Conitzer and Sandholm, 2005] Vincent Conitzer and Tuo- mas Sandholm. Common voting rules as maximum likeli- hood estimators. InProceedings of the Twenty-First Con- ference on Uncertainty in Artificial Intelligence (UAI’05), pages 145–152, 2005.
[Conitzer, 2014] Vincent Conitzer. The maximum likeli- hood approach to voting on social networks. InInterna- tional Symposium on Artificial Intelligence and Mathemat- ics, ISAIM 2014, Fort Lauderdale, FL, USA, January 6-8, 2014, 2014.
[Elkind and Slinko, 2016] Edith Elkind and Arkadii Slinko.
Rationalizations of Voting Rules, chapter 8. Handbook of Computational Social Choice. Cambridge University Press, 2016.
[Everaereet al., 2007] Patricia Everaere, Sébastien Konieczny, and Pierre Marquis. The strategy-proofness
landscape of merging. Journal of Artificial Intelligence Research, 28:49–105, 2007.
[Everaereet al., 2010a] Patricia Everaere, Sébastien Konieczny, and Pierre Marquis. The epistemic view of belief merging: Can we track the truth? InProceedings of the Twenty-Ninth European Conference on Artificial Intelligence (ECAI’10), pages 621–626, 2010.
[Everaereet al., 2010b] Patricia Everaere, Sébastien Konieczny, and Pierre Marquis. Disjunctive merging:
Quota and Gmin merging operators.Artificial Intelligence, 174(12-13):824–849, 2010.
[Haretet al., 2016] Adrian Haret, Andreas Pfandler, and Ste- fan Woltran. Beyond IC postulates: Classification crite- ria for merging operators. In Proceedings of the Twenty Second European Conference on Artificial Intelligence (ECAI’16), pages 372–380, 2016.
[Konieczny and Pino Pérez, 2002a] Sébastien Konieczny and Ramon Pino Pérez. Merging information under constraints: a logical framework. Journal of Logic and Computation, 12(5):773–808, 2002.
[Konieczny and Pino Pérez, 2002b] Sébastien Konieczny and Ramon Pino Pérez. On the frontier between ar- bitration and majority. In Proceedings of the Eighth International Conference on Principles of Knowledge Representation and Reasoning (KR’02), pages 109–118, 2002.
[Konieczny and Pino Pérez, 2011] Sébastien Konieczny and Ramon Pino Pérez. Logic based merging. Journal of Philosophical Logic, 40(2):239–270, 2011.
[Koniecznyet al., 2004] Sébastien Konieczny, Jérôme Lang, and Pierre Marquis. DA2 merging operators. Artificial Intelligence, 157:49–79, 2004.
[Pigozzi and Hartmann, 2007] Gabriella Pigozzi and Stephan Hartmann. Judgment aggregation and the prob- lem of truth-tracking. In Proceedings of the Eleventh International Conference on Theoretical Aspects of Rationality and Knowledge (TARK’07), pages 248–252, 2007.
[Pivato, 2012] Marcus Pivato. Voting rules as statistical esti- mators. Social Choice and Welfare, 40:581–630, 2012.
[Xiaet al., 2010] Lirong Xia, Vincent Conitzer, and Jérôme Lang. Aggregating preferences in multi-issue domains by using maximum likelihood estimators. In Proceed- ings of the Ninth International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10), pages 399–
408, 2010.
[Young, 1995] Hobart Peyton Young. Optimal voting rules.
Journal of Economic Perspectives, 9(1):51–64, 1995.
[Young, 2003] Hobart Peyton Young. Condorcet’s the- ory of voting. The American Political Science Review, 82(4):1231–1244, 2003.
Proofs
Proposition 2 LetP be any noise model. If∆is an MLE for P, then∆satisfies(IC0),(IC1),(IC3),(IC7)and (IC8).
Proof:
(IC0): By definition, if ∆ is a MLE for P, then for any profileEoverI,[∆µ(E)] =argmaxωPµ,ω(E). Since Pµ,ω(E)is defined only for the worldsωthat are models ofµ, we get that∆µ(E)|=µ, so(IC0)is satisfied.
(IC1): Let∆ be a MLE forP. Sinceµis supposed to be consistent, and since∆ satisfies(IC0), it is enough to prove that for any profileEoverI,argmaxωPµ,ω(E)is not empty. This is obvious since[µ]is a finite set.
(IC3): Obvious from Definition5(the random variables that are used ensure that the syntactic nature of the bases has no impact on the probability of the corresponding pro- file, and the second condition in the definition guarantees that the syntactic of the integrity constraint is irrelevant as well).
(IC7): Let ω be any model of ∆µ1(E)∧ µ2. Since ∆ is a MLE for P, we have ω ∈ argmaxω0Pµ1,ω0(E).
Sinceω |= µ2and sinceµ1∧µ2 |= µ1, we thus have ω ∈argmaxω0Pµ1∧µ2,ω0(E). Henceω |= ∆µ1∧µ2(E), showing that(IC7)is satisfied.
(IC8): Suppose that ∆µ1(E) ∧ µ2 is consistent and let ω be any model of ∆µ1∧µ2(E). Then ω ∈ argmaxω0Pµ1∧µ2,ω0(E). Since ∆ satisfies (IC0), we have that ω |= µ1 ∧ µ2. Now, if ω is a model of
∆µ1(E)∧µ2, the conclusion follows. Towards a con- tradiction, assume that the remaining case, i.e.,ωis not a model of∆µ1(E)∧µ2, is feasible. Sinceω |= µ2, we must have ω 6∈ argmaxω0Pµ1,ω0(E). Hence there must exist a worldω”such thatPµ1,ω”(E)> Pµ1,ω(E) and we can find such a world ω” among the mod- els of ∆µ1(E)∧µ2 since∆µ1(E)∧µ2 is consistent.
But then, since ω” is a model of µ2, we also have Pµ1∧µ2,ω”(E) > Pµ1∧µ2,ω(E), which contradicts the fact that ω ∈ argmaxω0Pµ1∧µ2,ω0(E). Hence(IC8)is satisfied.
2 Proposition 3 LetPbe any noise model satisfying the inde- pendence condition. If∆is an MLE forP, then∆satisfies (IC5)and(IC6).
Proof:
(IC5): Towards a contradiction, suppose that ∆ does not satisfy (IC5). Then there exist two profiles E1 and E2 of belief bases (associated with a bipartition ofI) such that∆µ(E1)∧∆µ(E2)6|= ∆µ(E1tE2). Since
∆µ(E1)∧∆µ(E2)is supposed to be consistent, there exists ω, a model of ∆µ(E1)∧∆µ(E2)that is not a model of∆µ(E1tE2). Thus, among the models ofµ,ω maximizesPµ,ω(E1)andPµ,ω(E2), hence it maximizes Pµ,ω(E1)×Pµ,ω(E2). But whenP satisfies the inde- pendence condition, we havePµ,ω(E1)×Pµ,ω(E2) =
Pµ,ω(E1tE2). Hence, ω is a model ofµ that max- imizes Pµ,ω(E1tE2), showing thatω is a model of
∆µ(E1tE2), contradiction.
(IC6): Let ω be any model of µ that is a model of
∆µ(E1)∧∆µ(E2)(for sure, such a model exists since
∆µ(E1)∧∆µ(E2)is supposed to be consistent). Then ωmaximizesPµ,ω(E1)andPµ,ω(E2), soωmaximizes Pµ,ω(E1)×Pµ,ω(E2). This shows thatω maximizes Pµ,ω(E1tE2)whenP satisfies the independence con- dition. Accordingly,ωis a model of∆µ(E1tE2), and this completes the proof.
2 Proposition 4 Let dbe any pseudo-distance on interpreta- tions. Let k be any natural number. There exists a noise modelP satisfying the independence condition and the uni- formity condition such that the merging operators∆d,Σkand
∆d,leximax are MLEs forP.
Proof: Let ω? |= µ be the true state of the world and E=hK1, . . . , Knia given profile. Let us define the noiseP such that the probability that any worldω is a model of the base associated with any agentj, namely Pµ,ω?((ω, j)), is equal toα× 1
2(d(ω,Kj)+1)k, whereαis a positive real number that does not depend onj(the uniformity assumption ensures that it exists) andkis a fixed integer.
Due to the independence condition onP, the probability ofω to be a model of every base ofEis thus equal to
Πnj=1 α
2(d(ω,Kj)+1)k = αn
2Σnj=1((d(ω,Kj)+1)k). Then the problem of determining the maximum likeli- hood estimates of ω? consists in exhibiting the interpre- tations ω |= µ such that αn
2Σnj=1((d(ω,Kj)+1)k)
is maximal.
Obviously, αn
2Σnj=1((d(ω,Kj)+1)k)
is maximal if and only if Σnj=1((d(ω, Kj) + 1)k)is minimal, andΣnj=1((d(ω, Kj) + 1)k) is minimal if and only if the weighted sum of Ham- ming distances of ω to every base from the profile E = hK1, . . . , Kni is minimal. This shows that every∆d,Σk is an MLE forP.
Finally, it is well-known [Konieczny and Pino Pérez, 2002b] that for a sufficiently large k whatever the pseudo-distance d over interpretations and the profile E = hK1, . . . , Kni, two worlds ω and ω0 are such that Pn
i=1d(ω, Ki)k ≤ Pn
i=1d(ω0, Ki)k if and only if the leximax aggregation of the valuesd(ω, K1), . . . , d(ω, Kn) is less than or equal to the leximax aggregation of the values d(ω0, K1), . . . , d(ω0, Kn) w.r.t. the lexicographic ordering. Hence an interpretationωminimizing theleximax aggregation of the values d(ω, K1), . . . , d(ω, Kn) w.r.t.
the lexicographic ordering maximizes the probability Πnj=1 α
2(d(ω,Kj)+1)k whenkis sufficiently large. Accordingly, the merging operator∆d,leximax is also an MLE forP. 2
Proposition 5 dswapis a pseudo-distance between formulae.
Proof:
• Symmetry: Obvious.
• Separation: Suppose thatdswap(φ, φ0) = 0. So| {ω ∈ Ω | ω |= φ∧ ¬φ0} |= 0, i.e., φ |= φ0; and | {ω ∈ Ω | ω |= φ0∧ ¬φ} |= 0, i.e., φ0 |= φ. As a conse- quence, we haveφ ≡ φ0. The converse implication is straightforward.
2 Proposition 6 Letωbe any model of the integrity constraint µ, φ be any formula having ω as its single model, and let Ki, Kj be two belief bases of E. If dswap(Ki, φ) ≥ dswap(Kj, φ), thenPµ,ωwsn(Ki)≤Pµ,ωwsn(Kj).
Proof: LetKibe any belief base of the profileE. Since Pwsnsatisfies the independence condition, we have that:
Pµ,ωwsn(Ki) = Πω|=KiPµ,ωwsn((ω, i))
×Πω0|=Ki∧µ|ω06=ωPµ,ωwsn((ω0, i))
×Πω06|=Ki∧µ|ω06=ωPµ,ωwsn((ω0, i))
×Πω6|=KiPµ,ωwsn((ω, i))
Clearly, Πω|=KiPµ,ωwsn((ω, i)) = 1 − p if ω |= Ki
and = 1 otherwise (it is an empty product). Similarly, Πω6|=KiPµ,ωwsn((ω, i)) = pif ω 6|= Ki and= 1 otherwise.
Hence:
Pµ,ωwsn(Ki) = Πω|=Ki(1−p)
×Πω0|=Ki∧µ|ω06=ωp
×Πω06|=Ki∧µ|ω06=ω(1−p)
×Πω6|=Kip
Pµ,ωwsn(Ki) = Πω0|=(Ki⊕φ)(1−p)×Πω06|=(Ki⊕φ)p Pµ,ωwsn(Ki) = (1−p)|[Ki⊕φ]|×p2m−|[Ki⊕φ]|
Pµ,ωwsn(Ki) = (1−p)dswap(Ki,φ)×p2m−dswap(Ki,φ) Asp < 12, the conclusion follows. 2
Proposition 7 ∆dD,Σis an MLE forPwsn. Proof: The proof is organized into two lemmata:
Lemma 1 A merging operator∆that is a MLE forPwsnis such that for any integrity constraintµ, and any profileE, a worldωis a model of∆µ(E)if and only if minimizes the sum of the swap distances of every base ofEtoφ, whereφis any formula havingωas its single model.
Proof: Since Pwsn satisfies the independence condition, we have Pµ,ωwsn(E) = Πnj=1Pµ,ωwsn(Kj).
From Proposition 6, provided that Pwsn sat- isfies the uniformity condition, we also have:
Πnj=1Pµ,ωwsn(Kj) = Πnj=1pdswap(Kj,φ)
×(1−p)2m−dswap(Kj,φ) Πnj=1Pµ,ωwsn(Kj) = pΣnj=1dswap(Kj,φ)
×(1−p)n×(2m−Σnj=1dswap(Kj,φ)) Finally, since p < 12, when ∆ is a MLE for Pwsn, the models of ∆µ(E) are the worlds ω that minimizes
Σnj=1dswap(Kj, ω?). 2
Lemma 2 Letµbe any integrity constraint and let E be a profile of belief bases. Every modelωof∆dµD,Σ(E)minimizes the sum of the swap distances of every base ofEtoφ, where φis any formula havingωas its single model.
Proof: Whenever a formulaφhas a single modelω, it is easy to check that for every baseK:
dswap(φ, K) =
|[K]| −1 ifω|=K
|[K]|+ 1 otherwise.
Asdswap(φ, E) = Pn
i=1dswap(φ, Ki), an interpretationω minimizingdswap(φ, E)is an interpretation which satisfies a maximal number of basesKiofE:
dswap(φ, E)is minimalif and only ifω|= ∆dµD,Σ(E).
22
Proposition 8 ∆dH,Σis an MLE forPv.
Proof: Letw be any interpretation. LetdH,Σ(w, E)de- notedH(w, K1) +. . .+dH(w, Kn), where dH(w, Ki) = minω∈KidH(w, ω). Let ωiw be a model of Ki such that dH(w, ωiw) =dH(w, Ki).
We havedH,Σ(w, E) =dH(w, ω1w) +. . .+dH(w, ωnw).
Now, by definition of the atom swap noise, the probabil- ity that ωwi is a model of Ki given the interpretation w is Pµ,wv ((ωwi, i)) =pdH(w,ωwi)+1. In addition
P((ωw1,1)∩. . .∩(ωnw, n)) =pdH(w,ω1w)+1×. . .×pdH(w,ωwn)+1 since the atom swap noise satisfies independence. Further- more
pdH(w,ωw1)+1×. . .×pdH(w,ωwn)+1
=pn×pdH(w,ω1w)+...+dH(w,ωwn)=pn×pdH,Σ(w,E). We want to prove that∆dH,Σis an MLE forPv, which means that[∆dµH,Σ(E)] =argmaxωPµ,ω(E). This is a direct con- sequence of the following equivalences:
w∈argmaxωPµ,ω(E)
⇔ ∀w0|=µ, Pµ,w(E)≥Pµ,w0(E)}
⇔ ∀w0|=µ, pdH(w,ωw1)+...+dH(w,ωnv)≥ pdH(w0,ωw
0
1)+...+dH(w0,ωnw0)
⇔ ∀w0|=µ, dH(w, ω1w) +. . .+dH(w, ωnw)
≤dH(w0, ω1w0) +. . .+dH(w0, ωwn0)
⇔ ∀w0|=µ, dH,Σ(w, E)≤dH,Σ(w0, E)
⇔w|= ∆dµH,Σ(E). 2