OATAO is an open access repository that collects the work of Toulouse
researchers and makes it freely available over the web where possible
Any correspondence concerning this service should be sent
to the repository administrator: tech-oatao@listes-diff.inp-toulouse.fr
This is an author’s version published in: http://oatao.univ-toulouse.fr/24811
To cite this version: Bounhas, Myriam and Prade, Henri and
Richard, Gilles Oddness/evenness-based classifiers for Boolean or
numerical data. (2017) International Journal of Approximate
Reasoning, 82. 81-100. ISSN 0888-613X
Official URL:
https://doi.org/10.1016/j.ijar.2016.12.002
Oddness/evenness-based
classifiers
for
Boolean
or
numerical data
Myriam
Bounhas
a,
b,
Henri
Prade
c,
d,
∗
,
Gilles
Richard
c aLARODECLaboratory,ISGdeTunis,41ruedelaLiberté,2000Le Bardo,TunisiabEmiratesCollegeofTechnology,P.O.Box:41009,AbuDhabi,UnitedArabEmirates cIRIT,UPS-CNRS,118routedeNarbonne,31062ToulouseCedex,France dQCIS,UniversityofTechnology,Sydney,Australia
a
b
s
t
r
a
c
t
Keywords: Logicalproportion Classification Booleandata Numericaldatak-Nearestneighborsmethod
In this paper, we propose two viewpoints for estimating to what extent a new item, described intermsofbinary-valued features, fits with asetof existingitems. Theyare respectively basedonan oddnessindexand an evennessindex, whichinspite oftheir names, are not exactly the oppositeof each other. Both indicators, whichreferto one feature,arebuiltfromheterogeneouslogicalproportions,andinvolvefouritems,thenew itemandthreeothers.LogicalproportionsareBooleanfunctionsthatrelatefourvariables through comparisonsbetween pairs of them.Heterogeneous onesexpress thatthere is anintruderamongfourtruthvalues,whichisforbiddentoappearinaspecificposition. Global oddnessand evennessfunctions of anitemwith respectto aset are built from thecorrespondingindexesbytakingallfeaturesintoaccount,andthenbyconsideringall triplesofitemsintheset.Moreovertheoddnessfunctionnaturallyextendstonumerical featuresandtosubsetsofitemsofdifferentsizes(pairs,triples,etc.).Simpleclassification procedures canbebasedontheseglobalfunctions: anewitemisassignedto theclass thatminimizesoddnessormaximizesevenness.Experimentsonclassicalbenchmarkswith Boolean,ornumericaldata(foroddness)showthattheresultsarecompetitivewithother classificationmethods.
1. Introduction
Ithasbeenacknowledgedforalongtimethat proportionsplayan importantroleinourperceptionandunderstanding of reality. Indeed proportions are a matter of comparisons expressed by differencesor ratios that are equated to other differencesorratios.Twocenturiesago,Gergonne[10,11]wasthefirsttoexplicitlyrelatenumerical(geometric)proportions totheideasofinterpolationandregression.
Itisonlyinthelastdecadethatanalogical proportions,i.e., statementsoftheform A isto B asC isto D,whereeach capitalletter refers to asituation described by a vectorof featurevalues, havebeenformalized firstin termsof subsets of properties that hold true in a given situation [12,26], andthen in a logical manner [16]. Quite early, it was shown
*
Correspondingauthorat:IRIT,UPS-CNRS,118routedeNarbonne,31062ToulouseCedex,France.E-mailaddresses:Myriam_Bounhas@yahoo.fr(M. Bounhas),prade@irit.fr(H. Prade),richard@irit.fr(G. Richard).
that a formal view of analogical proportions maybe the basis of a new type of classifier that performs well on some difficult benchmarks [1,15].This was confirmed by other implementations directly based on a logicalview ofanalogical proportions[3].
Besides, itwas shownthat analogical proportionsbelong toa larger familyofso-calledlogicalproportions thatrelate a 4-tupleofBooleanvariables[17],wherethe8code-independentlogicalproportionsareofparticularinterestsincetheir truth status remain unchangedif a property is encoded positively or negatively.These 8 logical proportions divide into 4 homogeneous proportions,which includethe analogical proportionand3 relatedproportions,and4 heterogeneous
pro-portions [20].A heterogeneous proportionexpresses the idea that thereis an intruder amongthe 4truth values, which is forbiddentoappearina specificposition.Intuitivelyspeaking,an item properlyassignedtoa classshouldnot be (too much)anintruderinthisclass.Itsuggeststhatheterogeneousproportionsmaybealsoofinterestasabasisfordesigning anewtypeofclassifier.Thisisthetopicofthepaper.
It isa commonsenseprincipletoconsiderthataclass cannotbereasonablyassignedtoanewitem ifthisitemwould appear to be at odds with respectto the known members of theclass. On the contrary, theitem should be even with respect to theseclass members forentering the class. The useof proportions leads tothe ideaof considering triples of elements ina class asa basis forestimating theevenness orthe oddness ofa new item withrespect tothe class. This departs from theusual view wherethe estimationof the(non) agreementof anewitem withrespect toa set ofitems amountstocomparetheitem,featurebyfeature,withadistributionofthefeaturevaluesinthewholeset.
Anoddnessindexandanevennessindex,whicharenottheexactoppositeofeachother,areproposed.Intheevenness view,triplesaretheonlysubsetswherewhenthenewitemconformswiththeminorityforagivenBooleanfeature,there isnolongeranymajority(withrespecttothisfeature)inthetripleaugmentedwiththenewitem.Thenonecanestimate towhatextentanewitemfitswiththemajorityofelementsinanytripleofmembersofaclassonasetoffeatures.
However, it isunclear howto extendtheevenness-based approachto numericaldata.A slightly differentview,based onthedirectestimationofoddness,whichcanstillberelatedtoheterogeneouslogicalproportions,canbealsoconsidered. Thisleadstoanoddnessmeasurethatcanbeextendedtonumericalfeaturesinastraightforwardmanner,andthatcanbe alsogeneralizedtosubsetsofanysizeandnotonlyontriples.Thusinthispaper,theevaluationoftheevennessorofthe oddness ofanitem withrespecttoa classreliesonalocal view,wherethe newitem, shouldappeareven/notappearat oddswithamaximumnumberof(small)subsetsofaconsideredclass.
The paperisorganizedasfollows.Thenext sectionprovides thenecessarybackgroundonBooleanlogicalproportions, introducingthetwotypesofproportions:thehomogeneousonesandtheheterogeneousones,byespeciallyemphasizinga code independencyproperty.Results areestablished that singleout theseproportionsintermsofparticular featuresthat are meaningfulwhenitcomesto classification.Then,basedonheterogeneousproportions,oddnessandevennessindexes are introducedinSection 3,andtheirexact relationshipestablished.The extensionofoddness andevenness indexestoa numericalfeatureisdiscussed.Section4describesheterogeneousproportions-basedclassifiers.Weshowhowtheproposed oddness andevennessindexescanprovidea basisforestimatingtheoddnessortheevenness ofanitem withrespectto a wholeclass. Therelatedwork Section5 providesa briefoverviewofanalogicalproportions-basedclassifiers, which are based on ahomogeneous logical proportion, butwork quite differentlyfromheterogeneous proportions-basedclassifiers. Section 6isdevotedtoa setofexperimentsonstandardbenchmarkscomingfromtheUCIrepository.Theyarecompared both with classical classifiersandwithanalogical proportions-basedclassifiers. Finally, we provide some hintsfor future worksandconcludingremarksinSection7.
Thispapergathersandsubstantiallyextendsapproachesandresultspartiallyreportedinthreeconferencepapers[5–7].
2. Heterogeneousproportionsvs.homogeneousproportions
Logical proportionsare thebasicingredientsofourapproach.TheseproportionsareBooleanformulasinvolving4 vari-ables.Theyhavebeendeeplyinvestigatedin[19].Inthefollowingsection,wefirstrecallhowtheyarebuilt,thenwefocus ontheproportionsthatwewilluseinthispaper.
Asfornotations,thepaperusesthestandardonesforBooleanconnectives,namely
∨,
∧,
≡,
→
fordisjunction, conjunc-tion,equivalenceandmaterialimplicationrespectively.2.1. Backgroundonlogicalproportions
Alogicalproportionstatesarelationbetween4itemsthatisexpressedintermsofcomparisonsbetweenpairsofitems, eachitembeingrepresentedasasetofBooleanfeatures.Considering2Booleanvariablesa andb correspondingtothesame feature attachedto2items A and B,a
∧
b anda∧
b indicatethat A and B behavesimilarlyw.r.t.thegivenfeature(they are called“similarity”indicators),a∧
b anda∧
b thefactthat A and B behave differently(they arecalled“dissimilarity” indicators).Whenwehave4items A,
B,
C,
D,forcomparingtheirrespectivebehaviorinapairwisemanner,weareledto considerlogicalequivalencesbetweensimilarity,ordissimilarityindicators,suchasa∧
b≡
c∧
d for instance.Thisenables ustodefinealogicalproportion[19]:Definition1.A logicalproportion T
(
a,
b,
c,
d)
isthe conjunctionoftwo equivalencesbetweenindicatorsfor(
a,
b)
on one sideandindicatorsfor(
c,
d)
ontheotherside.Forinstance,
((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
is alogicalproportion. It hasbeenestablished thatthere are120 syntacticallyandsemanticallydistinctlogicalequivalences.Therearetwowaysfordistinguishingremarkablesubsetsamong the120proportions:eitherbyinvestigatingtheirstructure,orbyinvestigatingtheirsemantics(i.e.theirtruthtable).Inthis section, weshallsee thatboth investigationslead tothe sameconclusion:there isa classof8proportions whichstands outofthecrowd.Thisclasscanbesubdividedinto2sub-groupsof4proportions.Indeed a propertythat appears tobe paramountin manyreasoning tasks is codeindependency: there should be no distinction when encoding information positively or negatively. In other words, encoding truth (resp. falsity) with 1 or with 0(resp.with0and 1)isjustamatterofconvention,andshouldnotimpactthefinalresult.Whendealingwithlogical proportions,thispropertyiscalledcodeindependency andcanbeexpressedas
T
(
a,
b,
c,
d)
→
T(
a,
b,
c,
d)
Fromastructuralviewpoint,rememberthataproportionisbuiltupwithapairofequivalencesbetweenindicatorschosen among16equivalences. So,toensure codeindependency,the onlywaytoproceed istofirst choosean equivalencethen topairitwithitscounterpartwhereeveryliteralisnegated:forinstancea
∧
b≡
c∧
d shouldbepairedwitha∧
b≡
c∧
dinordertogetacodeindependentproportion.Thissimplereasoningshowsthatwehaveonly16/2
=
8 codeindependent proportionswhoselogicalexpressionsaregivenbelow.A
: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
R: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
P: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
I: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
H1: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
H2: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
H3: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
H4: ((
a∧
b)
≡ (
c∧
d))
∧ ((
a∧
b)
≡ (
c∧
d))
Only4amongtheseproportionsmakeuseofsimilarityanddissimilarityindicatorswithoutmixingthesetypesofindicators insideone equivalence:forthisreason, these4proportions A
,
R,
P,
I are calledhomogeneousproportions.Forinstance,an informal readingof A would be:“a differs fromb asc differsfromd and viceversa.” Thisexpresses themeaning ofan analogical proportion, i.e., a statement ofthe form“a is to b asc isto d”. Wecan consider A as aBoolean counterpart tothe ideaofnumericalproportion, eithergeometric,i.e., ab=
dc,orarithmetic a−
b=
c−
d.It mayalsobeviewedasa qualitativeformofcomparisonofdifferences,reminiscenttotheconceptofderivative wherewestudytheratio f(aa)−−bf(b) with f(
a)
=
c and f(
b)
=
d,whichisclosetonumericalproportions.Theideaofaproportionsuggeststhatsome stabilitypropertiesholdw.r.t.permutations.Indeed,wecanpermute vari-ables and check, for instance,if a given proportion still holds when permuting the 2 first variables. We denote pi j the permutationofvariableinpositioni withvariableinposition j.Forinstance,p14 permutesthevariablesinextreme posi-tions1 and4,while p23permutesvariablesinmeanpositions.And p12
(
a)
=
b,
p12(
b)
=
a,
p12(
c)
=
c,
p12(
d)
=
d.Definition2.AproportionT isstablew.r.t.permutationpi j iff T
(
a,
b,
c,
d)
→
T(
pi j(
a),
pi j(
b),
pi j(
c),
pi j(
d))
Itcan becheckedthat A isstablew.r.t.theextremes p14 orthemeans p23 permutations. P isstableforp12and p34 permutations,while R is stablefor p13 andp24.Moreover A
,
R,
P,
I are symmetrical(i.e. T(
a,
b,
c,
d)
→
T(
c,
d,
a,
b)).
This isobservableontheirtruthtables:SeethetoppartofTable 1,whereonlythe6patternsthatmakethelogicalproportions true appear). Besides, I is the only logical proportion that is stable w.r.t. any permutation of two ofits variables. This noticeableresultisprovedin[18].Moreover, R andP arecloselyrelatedto A viapermutations.Namelywehave
A
(
a,
b,
c,
d)
≡
P(
c,
b,
a,
d)
≡
R(
b,
a,
c,
d)
Infact,whend isfixed,exchangingthevariablesa
,
b,
c amountstomovefromonehomogeneousproportiontoanother, orto remainstable( A(a,
b,
c,
d)
=
A(
a,
c,
b,
d);
P(
c,
b,
a,
d)
=
P(
b,
c,
a,
d);
R(
b,
a,
c,
d)
=
R(
c,
a,
b,
d)),
with I remainingan exception. Thus A,
R,
P collectively maintain a formof exchangeability property withrespect to a,
b,
c, while I ensuresit byitself. Theseexchangeabilitypropertiesare ofparticular interestwhen applyinghomogeneous logicalproportionsto classification.
The 4 remaining code independent logical proportions H1
,
H2,
H3,
H4 are calledheterogeneousproportions: it is clear fromtheirlogicalexpressionthattheymixsimilarityanddissimilarityindicatorsinsideeachequivalence.TheirtruthtablesTable 1
Homogeneous/heterogeneousproportionsvalidpatterns.
A R P I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 H1 H2 H3 H4 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0
areshowninthebottompartofTable 1,whereonlythe6patternsthatmakethemtrueappear.Theindexi inHirefersto apositioninsidetheformulaHi
(
a,
b,
c,
d).
Namely,ascanbechecked,ineachofthese6patternsthereisaminorityvalue (i.e.,thevaluehavingthesmallestnumberofoccurrencesinthepattern,thenthisvalueislikeanintruderamongtheother values), andi is theonlypositionwherethe minorityvalue neverappears amongthe64-tuplesofvaluesthat make Hi true.ByexaminingthetruthtableoftheheterogeneousproportionsinTable 1,wegetthefollowingproperties:
A
(
a,
b,
c,
d)
∧
R(
a,
b,
c,
d)
∧
P(
a,
b,
c,
d)
∧
I(
a,
b,
c,
d)
= ⊥
togetherwith
(
A(
a,
b,
c,
d)
∧
R(
a,
b,
c,
d)
∧
P(
a,
b,
c,
d))
≡
Eq(
a,
b,
c,
d)
where Eq
(
a,
b,
c,
d)
=
1 ifa=
b=
c=
d and Eq(
a,
b,
c,
d)
=
0 otherwise.Similarly,forheterogeneousproportions,wehaveH1
(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3(
a,
b,
c,
d)
∧
H4(
a,
b,
c,
d)
= ⊥
whichimplies:(
H1(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3(
a,
b,
c,
d))
→ ¬
H4(
a,
b,
c,
d)
(1) Obviously, we have similar properties by permuting the indexes of the Hi’s. The meaning of the conjunctionH1
(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3(
a,
b,
c,
d)
will be discussed in the following section, devoted to heterogeneous propor-tions.2.2. Specificityofheterogeneousproportions
In order to get a clear understanding of the heterogeneous proportions and to extract relevant properties, we now investigatetheirtruthtables.
2.2.1. Heterogeneityandexchangeability
StillwithinTable 1,anobvioussemanticsappears:Hiholdswhenthereareexactly3parameterswithidenticalBoolean values(
=
1 forexample)andtheparameterinpositioni isoneoftheseidenticalvalues.Definition3.Given4 Boolean values a
,
b,
c,
d in thisorder such that 3 of them are identical and the remaining one is different,thepositioni∈ [
1,4]
ofthisremainingvalueiscalledtheintruderposition ortheintruderforbrevity.Then, Hiholdsiffthereisanintruderamongthe4valuesa
,
b,
c,
d andtheintruderpositionisnoti.ThissuggeststhatHishouldbestablew.r.t.thepermutationswhichdonotaffectpositioni.Infactalittlebitmorecanbeestablished:
Property1. ApartfromI,Hiaretheonlylogicalproportionsstablew.r.tanypermutationwhichdoesnotaffectpositioni.
ThespecialcaseofI stablew.r.t.anypermutationhasbeenalreadyprovedin[19].Table 1allowstocheckthattheHi’s arestablew.r.t.thepermutationswhichdonotaffectpositioni.Showingthattheyaretheonlyonesamongthe120logical propertiesstablew.r.t.thesepermutationsrequiresatediouscheckingprocedurethatcannotbesummarizedhere.
Property 1isquitesatisfactoryandconfirmstheinformalsemanticsofHi.Thiswillbeusefulwhenusingheterogeneous proportionstoclassification.Moreimportantly,thisgivesaclearsemanticstotheconjunctionH1
(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3
(
a,
b,
c,
d)
above: H1(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3(
a,
b,
c,
d)
holdsiffamongthe4valuesa,
b,
c,
d,thereisan intruder andthisintruderisd.Thiswillbethebasisofouroddnessmeasure.Inthenextsubsection,weestablishsomeresultsabout theparityofthenumberof1or0intruthtablesforheterogeneousproportions,whicharecontrastedwithhomogeneous proportions.Thisleadstoamodelofoddnessofagivenvalue,amongamultisetof4values.2.2.2. Parityofthenumberof1or0intables
SincelogicalproportionsareBooleanformulasinvolving4variables,theirtruthtableshave16rows,whereonly6lead to1(see[19]foracompleteinvestigation).Onecouldaskifanytruthtablehaving6linesleadingto1and10linesleading to0corresponds toalogicalproportion.A simplenumberingargumentshowsthatthisisnotthecase.Ontopofthat,we canbuildclassesofpatternswhichcannotbevalidforanyproportion:
Property2. Thereisnologicalproportionthatistrueforthefourelementsofthesetofvaluations
{
0111,1011,1101,1110}
.The sameholdsfor{
1000,0100,0010,0001}
.Proof. Anequivalencebetweenindicatorsisoftheforml1
∧
l2≡
l3∧
l4.Ifthisequivalenceisvalidfor{
0111,1011}
,itmeans that itstruth value doesnot changewhenwe switch thetruth value ofthe2 firstliteralsfrom0 to1: thereare only2 indicatorsfora andb satisfyingthisrequirement:a∧
b anda∧
b.Ifthisequivalenceisstillvalidfor{
1101,1110}
,itstruth valuedoesnotchangewhenweswitchthetruthvalueofthe2lastliteralsfrom0 to1:thereareonly2indicatorsforc and d satisfyingthisrequirement:c∧
d andc∧
d.Thentheequivalencel1∧
l2≡
l3∧
l4isjusta∧
b≡
c∧
d,a∧
b≡
c∧
d,a∧
b≡
c∧
d ora∧
b≡
c∧
d.Noneoftheseequivalencesistrueforthefourelementsofthesetofvaluations{
0111,1011,1101,1110}
. Thesamereasoningisstillapplicablefortheotherclass.2
Applying a similar reasoning, we can build other set of valuations which cannot make simultaneously true a logical proportion.
Property3. Alogicalproportioncannotbemadetruebyasetof4valuationsincluding3valuationsofoneoftheclassesappearingin
Property 2andwherethe4thvaluationisjustthenegationcomponentwiseoftheremainingvaluationoftheclass.
Forinstance,thereisnologicalproportiontruefor
{
0111,1011,1101,0001}
orfor{
0111,0100,1101,1110}
. Thisremarkhelpsestablishingthefollowingresult:Property4. Heterogeneousproportionsaretheonlyproportionswhoseallvalidpatternshaveanoddnumberof1.
Proof. Fromthetruthtables, weobservethatonlyvalidpatternsforheterogeneousproportionshaveanoddnumberof1. Letusnowconsideraproportionwhose6validpatternscarry anodd numberof1. Asthereare exactly8patternswith anoddnumberof1,andthankstothepreviousproperty,thisproportionincludesnecessarily3patternsfromeach ofthe previousclasses.Ifthevalidpatternsinoneclassareobtainedfromthevalidpatternsfromtheotherclassjustbynegating all the variables,the proportionis codeindependent andthen, it isa heterogeneous proportion. Inthe opposite case, it means that we have atleast one pattern inthe first class withno negated counterpart in the other class: for instance, 1110,1101,1011 arevalidbut0001 isnota validpattern, leavingonly1000,0100,0010 tocompletethetruth tableofa logicalproportion.ThenProperty 3tellsthatthereisnoproportionvalidfor1110,1101,1011,1000.
2
Asimilarpropertyholdsforhomogeneousproportions:
Property5. Homogeneousproportionsaretheonlyproportionswhoseallvalidpatternshaveanevennumberof1.
Wenowshowthespecificityofheterogeneous(andhomogeneous)proportionsfromareasoningpointofview.
2.2.3. Inferenceandunivocalproportions
Thereisawaytoinferunknownpropertiesofapartiallyknownobject D startingfromtheknowledge wehaveabout itsotherspecifiedproperties,andassumingthatalogicalproportionT holdscomponentwisewiththreeotherobjectsA, B, C ,alsorepresentedintermsofthesamen Booleanfeatures.Thiscanbedoneviaaninductionprinciplethatcanbestated asfollows(where J isasubsetof
[
1,n]
,andxidenotesthetruthvalueoffeaturei forobjectX∈ {
A,
B,
C,
D}
):∀
i∈ [
1,
n] \
J,
T(
ai,
bi,
ci,
di)
∀
i∈
J,
T(
ai,
bi,
ci,
di)
Thiscan be seen asa continuityprinciple assuming that ifit isknown thata proportionholds forsome attributes, this proportionshouldstillholdfortheotherattributes.Itgeneralizestheinferenceprincipleusedwiththeanalogicalproportion
no guaranteethatthe conclusionholdswhen thepremisseshold. Nevertheless,speciallywhentheratio |nJ| is closeto 1, which meansthat proportionsholdon alarge numberofattributes, itis naturalto considerthat such aproportionmay alsoholdonthesmallnumberofremainingattributes.
This principle requires the unicity of the solution of equation T
(
a,
b,
c,
x)
=
1 where x is unknown, when it exists. Namely,given3Booleanvaluesa,
b,
c,wewanttodetermineforwhatlogicalproportionT theequation T(
a,
b,
c,
x)
=
1 is solvable,andinsuchacase,ifthesolutionisunique.Definition4.If, whentheequation T
(
a,
b,
c,
x)
=
1 issolvable, thesolutionisunique, thentheproportionT is saidto be 4-univocal. Inasimilarmanner,onemaydefineproportionsthatare1,2,or3-univocal.T isunivocal whenitisi-univocalforeveryi
∈ [
1,4]
.Firstofall,itiseasytoseethattherearealwayscaseswheretheequationT
(
a,
b,
c,
x)
=
1 hasnosolution,whateverthe proportionT .Indeed,thetriplea,
b,
c maytake23=
8 values,whileanyproportionT istrueonlyfor6distinctvaluations, leavingatleast2caseswithnosolution.Forinstance,whenwedealwithH4,theequationsH4(0,
0,0,x)
andH4(1,
1,1,x)
havenosolution.Wehavethefollowingresult:
Property6. Thehomogeneousandtheheterogeneousproportionsaretheonlyproportionswhichareunivocal.
Proof. From thetruthtables, weseethat the2typesofproportionssatisfytheproperty. Now,a proportionwhich isnot
i-univocalis necessarilyvalidboth forapatternwithan odd numberof1,andfora patternwithan evennumber of1. Then,Properties 4 and5excludehomogeneousandheterogeneousproportions.
2
Atthisstep,weseethatheterogeneousproportionsallowtosingleoutaparticularpositionamonganorderedlistof4 values.Thispositiontargetsthevaluewhichisdefinitelynotanintruderamongthemultisetof4items.Forinstance,when
Hiisvalid,thevalueinpositioni isnotanintruder.Weshallseeinthefollowingsectionthatthispropertycanbeusedto checktheoddnessofagivenitemw.r.t.amultisetofelements.
3. Evaluatingthepresenceortheabsenceofanintruderinamultiset
Based onheterogeneousproportions,we definean oddnessindexandthenan evennessindex, beginningineach case with Booleanvalues, beforeinvestigating their extension to dealwith gradedtruth values.We also laybare the relation betweenevennessandoddness.
TheseoddnessandevennessindexespertaintoaBoolean,oranumericalvalue(denoted x ord inthefollowing)w.r.t. a multiset S ofsuch values.Forreasonsexplainedinthissection,it willappeardesirableto keepthismultisetsmall(i.e.,
|
S|
=
3, ormaybe2intheoddnesscase).Moreover, inthissectionthe valuesx orin S maybe thoughtofasthevalues ofthesamefeaturefordifferentitems.InSection4,weshallbuildoddnessandevennessmeasuresfromtheseindexesby cumulatingthemoverfeatures,andbyconsideringcollectionsofmultisetsS withintheexamplesdescribingthesameclassC
inatrainingset.3.1. Oddnessindex
The ideaofoddness introducedbelowdirectlyreliesontheevaluationoftheextenttowhichanewitemtobeadded toasubsetreinforcesitsheterogeneityandsoappearsasanintruder init.
3.1.1. AnoddnessmeasureforBooleandata
Inthefollowing,wefirstdefinean indextoevaluatethe oddness ofanewcomer w.r.t.amultisetofBooleanvaluesvia theheterogeneousproportions,wherethemultisetisatriple.Then,weextendthisindextomulti-valuedlogicinthenext subsection,beforegeneralizingtheextendedoddnessindextomultisetsofanysize.
Let usrememberthe meaning of Hi: Hi holds iffthere isan intruder amonga
,
b,
c,
d and the parameterin positioni isnot thisintruder.As showninTable 2,each proportion Hi providesa pieceof knowledgeonthe intruderandwhen combined withother pieces, wecan pickout whichoneis theintruderamonga
,
b,
c and d.Forexample H1(
a,
b,
c,
d)
=
H2
(
a,
b,
c,
d)
=
H3(
a,
b,
c,
d)
=
1 meansthatthereisanintruderwhichisoutofthemultiset{
a,
b,
c}
. Thenwedefinetheoddnessofd w.r.t.{
a,
b,
c}
bythefollowingformula:O dd
(
{
a,
b,
c},
d)
=
de f H1(
a,
b,
c,
d)
∧
H2(
a,
b,
c,
d)
∧
H3(
a,
b,
c,
d)
(2) Asanimmediateconsequenceofequation(1),wehave:Table 2
H1,H2,H3andO dd truthvalues.
a b c d H1 H2 H3 Odd 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0
DuetothepermutationpropertiesoftheHi’s,therighthandsideofthisdefinitionisstablew.r.t.anypermutationofa
,
b,
c, thenthemultisetnotationonthelefthandsideisjustified.Thetruthtableof O dd isgiveninTable 2.Itisclearthat O dd holdsonlywhenthevalueofd isseenasodd amongtheothervalues:d istheintruder.Moreover
O dd does not hold inthe opposite situation where there is a majority amongvalues in a
,
b,
c,
d and d belongs to this majority(e.g. O dd(
{
0,1,0},
0)=
0),orthereisnomajorityatall(e.g. O dd(
{
0,1,1},
0)=
0).AsimpleobservationofTable 2showsthattheoddnessindexcanberewrittenas
O dd
(
{
a,
b,
c},
d)
≡ ((
a∨
b∨
c)
≡
d)
∧ (
a∧
b∧
c)
≡
d))
(3) Givenamultiseta,
b,
c of3identicalBooleanvalues, O dd(
{
a,
b,
c},
d)
canthenactasaflagindicatingifthe4thvaluedisdifferentfromthecommonvalueofa
,
b,
c.Thenthevalued isatoddsw.r.t.theothervalues.3.1.2. Extensiontonumericaldata
It is possible to extend the previous oddness measure in order to handlevariables with graded values (i.e. variables whosevaluesbelongto
[
0,1]
,afterasuitablenormalizationofnumericaldata).Fornow,thisoddnessisjust0 or1 (i.e.the truthvalue of O dd(
{
a,
b,
c},
d)),
butwe wouldliketoconsidertuplessuch as(0.1,
0.2,0.1,0.8)andstillconsiderthatthe 4thvalueissomewhatoddw.r.t.the3otherones.Adirecttranslationofformula(2),takingmin for
∧
,max for∨
,and1− |
· −
· |
for≡
asinŁukasiewiczlogic[23],leads to:O dd
(
{
a,
b,
c},
d)
=
de f min(
|
max(
a,
b,
c)
−
d|, |
min(
a,
b,
c)
−
d|)
(4) Firstofall,itisaneasygametocheckthat O dd remainscodeindependentwithgradedvalues,i.e.changingvaluesinto theircomplementto1.Letusexamine some examples toget aprecise understanding ofthe formulafornumericaldata andtocheck ifthis
oddness measurefitswiththeintuition.
•
We see that O dd(
{
u,
u,
u},
v)
= |
u−
v|
.Indeed, ifu=
v then obviouslythe 4thvalue isnot an intruder. The larger|
u−
v|
,themorev isatoddsw.r.tthe3valuesequaltou.•
Weseealsothat O dd(
{
v,
u,
u},
v)
=
0 whichisconsistentwiththeexpectedsemanticsof O dd.•
Generally,O dd(
{
u,
v,
w},
max(u,
v,
w))
=
O dd(
{
u,
v,
w},
min(u,
v,
w))
=
0,andinanycase, O dd(
{
u,
v,
w},
u)
≤
0.5.•
Letusnowconsider anumericalsituation with4differentnumericalvalues,forinstance:(
{
0,0.1,0.2},
0.9). Wefeel that d=
0.9 appears asan intruder in the multiset(
{
0,0.1,0.2})
. This is consistent with the obtained truth valueO dd
(
{
0,0.1,0.2},
0.9)=
0.7.Moreover, O dd(
{
0,0.1,0.1},
0.9)=
0.8 andO dd(
{
0,0.1,0.3},
0.9)=
0.6,whichfitswiththe intuition.•
Conversely,thepattern(
{
0.7,1,1},
0.9)doesnotstronglysuggest0.9 asanintrudervalue.IndeedO dd(
{
0.7,1,1},
0.9)=
0.1. O dd(
{
0.9,1,1},
0.7)=
0.2 isabithigher,asexpectedsincewehavemovedtowardsmoreuniformityamonga,
b,
candslightlyincreasedthedifferencesbetweend andtheelementsof
{
a,
b,
c}
.Moreover,notethatO dd(
{
0.7,1,1},
0.9)=
0.1, and O dd(
{
0,0,1},
0.9)=
0.1, since the two cases illustrate two different ways of not being really an intruder. Indeed, although 0.9 is close to the majority value in{
0.7,1,1}
in the first case, and far from the majority value in{
0,0,1}
in thesecond case, closeness to majorityvalue in{
a,
b,
c}
isnot atall what O dd estimates.Ratherit is expectedtofindsimilarestimatesinthetwoabovecases,sincetheyarerespectivelycloseto O dd(
{
1,1,1},
1)=
0 and to O dd(
{
0,0,1},
1)=
0 asshowninTable 2.•
Finally, O dd(
{
a,
b,
c},
d)
does not behave as|
d−
average(
{
a,
b,
c})|
: the cases{
a,
b,
c}
= {
0.5,0.5,0.5}
, d=
0.5, anddefini-tion, O dd
(
{
0.5,0.5,0.5},
0.5)=
0, while O dd(
{
0,0.5,1},
0.5)=
0.5. Thus O dd(
{
a,
b,
c},
d)
is a more accurate oddness measurethan|
d−
average(
{
a,
b,
c})|
when theset{
a,
b,
c}
containsheterogeneousvalues.From the previous examples,we understandthat theproposed definitionfits withtheinitial intuitionandprovides high truth valueswhend appearstobe atodds w.r.t.themultiset
{
a,
b,
c}
andlowtruth valuesintheopposite casewheredisnot verydifferentfromtheother values.Ontopofthat,theexpression ofO dd givenhereisnolongertheconjunction ofthemultiple-valuedextensions ofH1
,
H2,
H3asgivenin[20],whichwouldleadtoalesssatisfactorymeasureofoddness. Indeed, we are here interested in the oddness ofd w.r.t. a multiset{
a,
b,
c}
, andnot in picking out an intruder in the multiset{
a,
b,
c,
d}
asin[20].3.1.3. Oddnesswithrespecttomultisetsofvarioussize
Since weare interested incheckingifd seemsan intruderina givenmultiset,wemayconsidermultisets ofanysize whendefining theoddnessindex.Indeed,theprevious oddnessindexisnotlimitedtomultisets
{
a,
b,
c}
with3elements, andcan beeasily generalizedtoan indexofoddness O dd(
S,
x)
ofan item x w.r.t.a multiset S ofvalues in[
0,1]
ofany size,asfollows:O dd
(
S,
x)
=
de f min(
|
max(
S)
−
x|, |
min(
S)
−
x|)
As canbe seen,weonlycompare x totheupperandlowervaluesin S,whichmaybe reallyconsidered asameaningful summary of S onlyifS isvery small(when wehavenoadditionalinformationaboutthedistributionofvaluesin S),i.e.
|
S|
=
1,2,3 ormaybe 4.Clearly,thecomputationof O dd(
S,
x)
reflectsthesmallestdistanceofx toanelementin S onlyfor
|
S|
=
1,2.Indeed,thisisnotthe caseassoonas|
S|
≥
3 ascan beseen onthefollowing example: O dd(
{
a,
b,
c},
x)
=
O dd
(
{
0,0.5,1},
0.5)=
0.5,while|
b−
x|
= |
0.5−
0.5|
=
0.As itmaybe the caseforrealdatasets,we mayhavemissingvalues.Obviously,whenthere isa missingvalue inthe multiset S ofsizen,thenasimpleoptionistoconsiderthemultiset S ofsizen
−
1 andtoconsider O dd(
S,
x)
=
O dd(
S,
x)
where S hasnolongeranymissingvalue.3.2. Evennessindex
Adopting a dual viewpoint, we may want to know if adding a newelement to a given subset of items keeps it as homogeneousasitis,i.e.,thenewcomerdoesnotappearasanintruderinthissubset,andratheragreeswithitsmajority. Homogeneity can be considered as a kind of evenness of the newcomer w.r.t. the existing items of the subset. In this subsection,we advocatea wayofjudgingevenness onthe basisofthe majority,ifany,insidethe triples.Contrary tothe oddness definition, whereall Hi
,
i=
1,2,3 arerequired to define the oddnessindex, only H4 is neededfordefining an evennessindex,thusdenotedE ven4.3.2.1. AnevennessmeasureforBooleandata
Since theidea istoagree withamajority, wenoticethat thesmallestmultisets S ofelements wheremajoritymakes senseareclearlytriples.LetconsiderthreeBooleanvaluesa,b,c inS.Then,inaBooleanworld,therearetwopossibilities, eithera
=
b=
c,ortwoofthethreeareequal.Inbothcases,a strictmajoritytakesplace.Letm denotethemajorityvalue. Now consider the newcomer d,either d=
m, and m remains the majorityvalue in{
a,
b,
c,
d}
, ord=
m, and thereisno longeranymajorityin{
a,
b,
c,
d}
(twovaluesareequalto1andtwovaluesto0).Onlywiththefirstcase,d conformstothe majority.Notethat ifweconsiderlarger subsets S,evenwithonly4elements ratherthan3,itbecomes possiblethatthe new-comer increasesan existing minority,withoutchanging themajority.Indeed,themajorityvalue that maybe sharedby 3 elements inthe 4-elementsmultisetwillthen remainunchangedinthe5-elements multisetresultingfromthearrival of a fifthelementwhateveritsvalue. A similarphenomenontakesplaceifwe startwithlarger subsetsS having5elements ormore.Sowearelosingadistinctivepropertyof3-elementssubsetswhichhaveadifferentmajoritybehaviordepending ifd conformsornottothemajorityinthe3-elementssubset. Thismeansthattriplesaretheonlysubsetssuchthataddingan itemthatconformstothetripleminoritydestroysthemajority.Thus,3-elementssubsetsareabletoclearlydiscriminate,among differentd thosethatconformtothemajorityofthetriple.
Theideaofmajorityjustdescribedhelpsustodefineanewevenness measureviatheheterogeneousproportions.Letus recallthesemanticsof Hi: Hi holdsiffthereisanintruderamonga
,
b,
c,
d andtheparameterinpositioni isnot this in-truder.Asaconsequence,Hiimpliesthatthereisamajorityofvaluesamong(
a,
b,
c,
d)
andthevalueinpositioni conforms tothemajorityofvaluesappearingamongthe3otherpositions(i.e.themultisetofvalues{
a,
b,
c,
d}
ismoreorlesseven). But thereverse implicationdoesnot holdsince whenthe 4parameters haveidenticalvalue,∀
i∈ [
1,4],
Hi(
a,
b,
c,
d)
=
0. Then,tohaveaconciseBooleandefinitionfor“thereisamajorityofvaluesamongtheparametersa,
b,
c,
d andthe param-eter inposition i belongsto thismajorityof values”,weneed to considerthecasewhere all thevaluesare identicalby usingthefollowingformula:Table 3
H4,Eq andE ven4truthvalues.
H4 Eq Even4 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 1
where Eq
(
a,
b,
c,
d)
=
de f(
a≡
b)
∧ (
b≡
c)
∧ (
c≡
d).
Thus,with E veni we take intoaccountthe specialcasewhereallthe valuesareequal.ThetruthtableofE ven4 isgiveninTable 3.Itisclearthat E ven4 holdsonlywhenthevalueofd belongs to amajorityofthe parameter’svalues.And E ven4 doesnot holdinan opposite situationwherethere isnomajority of valuesasitisthecaseforE ven4(0011)
orE ven4(0110).
The situations where E ven4
(
a,
b,
c,
d)
=
1 exactly coverthe two cases already mentioned whered is identicalto the majorityvalue inthe triple{
a,
b,
c}
(is notthe intruder),namelyeither a=
b=
c, ortwo ofthethree areequal tod.So thefactthat d joins{
a,
b,
c}
,when E ven(
a,
b,
c,
d)
=
1,leavestheresultingsubset aseven asitwas,hencethename,and infactthemajorityisreinforcedbythearrival ofd.Notealsothat E ven4(
a,
b,
c,
d)
isleftunchangedbyanypermutation of{
a,
b,
c}
.Thismeans thatthe orderinginsidetriples doesnotmatter. Besides, E ven4(
a,
b,
c,
d)
=
E ven4(
a,
b,
c,
d)
wherex
=
1 if x=
0 andx=
0 ifx=
1,expressingthat E ven4(
a,
b,
c,
d)
doesnotdependonthewaytheinformationisencoded. Fromnowon, E ven4willbedenotedE ven asthisistheonlyoptionweusewithi=
4.3.2.2. RelationbetweenoddnessandevennessintheBooleancase
The oddness andevenness Boolean functionshave been built by truth tablesinspection. However, these 2 functions exhibitnoticeablelinks.Despitethefactthattheirnamemightsuggestthatoddnessandevennesscapturedualconcepts,it isnotthecasethat E ven
(
a,
b,
c,
d)
≡ ¬
O dd(
{
a,
b,
c},
d).
Infact,therelationsbetweenthe2measuresareasfollows(whereI denotestheinverseparalogydefinedinsection2.1):
Property7.
E ven
(
a,
b,
c,
d)
≡ ¬
O dd(
{
a,
b,
c},
d)
∧ ¬
I(
a,
b,
c,
d)
O dd
(
{
a,
b,
c},
d)
≡ ¬
E ven(
a,
b,
c,
d)
∧ ¬
I(
a,
b,
c,
d)
I
(
a,
b,
c,
d)
≡ ¬
E ven(
a,
b,
c,
d)
∧ ¬
O dd(
{
a,
b,
c},
d)
Thiscanbe easily checkedonthetruth tables.Thisreflects thefact that O dd and E ven and I are mutuallyexclusive. Let usnote that E ven
(
a,
b,
c,
d)
→ ¬
O dd(
{
a,
b,
c},
d),
that¬
E ven(
a,
b,
c,
d)
→ ¬
H4(
a,
b,
c,
d)
andthat O dd(
{
a,
b,
c},
d)
→
¬
H4(
a,
b,
c,
d)
aswell.Amorecompletediscussionoftherelationbetweenoddnessandevennesscanbefoundin[21].
3.2.3. Extensiontonumericaldata
Inordertodealwithgradedtruthvalues,weneedtoextendthepreviousdefinitionofEvengiveninformula(4)tothe casewherethetruthvaluesbelongto
[
0,1]
,aswehavedonefortheoddnessfunction.Adirecttranslationofformula(4),takingmin for
∧
,max for∨
,and1− |
· −
· |
for≡
asinŁukasiewicslogic,leadsto thefollowingexpressionforE ven4:max
(
min(
1− |
min(
a,
b)
−
min(
1−
c,
d)
|,
1− |
min(
1−
a,
1−
b)
−
min(
c,
1−
d)
|),
1
− |
max(
a,
b,
c,
d)
−
min(
a,
b,
c,
d)
|)
Letusexaminethebehaviorofthisdefinition.Inordertogetaclearpicture,weconsiderthe2followingcurves:
•
f(
x)
=
E ven(0,
x,
x,
x):
we wouldexpect f toget theconstant value 1,since, whateverits value,the last element x,Fig. 1. f and g functions with standard definition of E ven.
Fig. 2. f and g functions with the new definition of E ven.
•
g(
x)
=
E ven(0,
x,
x,
0): weexpect afunction decreasingfrom1 to 0 when x goesfrom0 to 1.Indeed,thesmaller x,thecloserto 1 E ven(0,x
,
x,
0) shouldbe,whilethelarger x themore0 appears tobe equaltotheminorityvalue in themultiset{
0,x,
x}
.The corresponding curves are inFig. 1. Black square dots are forfunction f , empty circles forfunction g (mindthat an emptycirclemaypartiallyhideablacksquarewhen f and g coincide).Ascanbeseen, f isnotaconstantfunctionandg
isnotmonotonicallydecreasing.ThiscontrastswithO dd
(0,
x,
x,
x)
=
0 and O dd(0,
x,
x,
0)=
0.It appears that a direct translationof the Booleandefinition (4) doesnot fit withthe expectedmeaning ofevenness in the caseof graded truth values. But obviously, we could start from the property E ven
≡ ¬
O dd∧ ¬
I to get another translationas:E ven
(
a,
b,
c,
d)
=
min(
1−
O dd(
a,
b,
c,
d),
1−
I(
a,
b,
c,
d))
ThisnewdefinitionleadsviaaneasycomputationtoE ven
(0,
x,
x,
x)
=
1−
x ifx≤
0.5 and1−
min(
x,
2−
2x)whenx≥
0.5. Withthisnewdefinitionof E ven, thecurvescorresponding to f and g aregiveninFig. 2(weuse blacksquaredots for function f andemptycirclesforfunction g again).Observethatthebehavior ofg
(
x)
=
E ven(0,
x,
x,
0)issatisfactory sincewegetthedecreasingfunction1−
x.Howeverf
(
x)
=
E ven(0,
x,
x,
x)
maybefarfrom1(inparticular,E ven(0,
23,
23,
23)
=
13).Suchabehaviorisnotsatisfactoryatall.Itis still anopenquestiontofindabetterdefinitionfor E ven inthegradedcase,whichwouldcoincidewiththeBooleancase whena,
b,
c,
d∈ {
0,1}
.Forthisreason,weshallnotexperimenttheevennessfunctionwithnumericaldata.3.2.4. Dealingwithmissingvalues
Missinginformationisquitecommoninreallifedatasetsandawaytoextendthesemanticsofanalogicalproportionto dealwiththisissuehasbeendeeplyinvestigatedin[20]forinstance.Infact,suchanapproachcanbealsoappliedhere,as explainednow.
Still keepingalogical approachandconsidering that‘?’ denotes amissingvalue (i.e.an informationisunknown),the idea is toextend thetruth table of the E ven formula asfollows: E ven
(?,
0,0,0)=
E ven(0,
?,0,0)=
E ven(0,
0,?,0)=
1,E ven
(?,
1,1,1)=
E ven(1,
?,1,1)=
E ven(1,
1,?,1)=
1, and E ven(
x,
y,
z,
t)
=
0 for any other patternincluding atleasta missing value ‘?’. It isclearthat, withthe 6 firstpatterns, whateverthe candidatevalue of themissing feature,the 4th argument belongs to the majority and cannot be an intruder. In all the remaining cases, where we have no certainty regardingthestatusofd,weadoptacautiousbehaviorbyconsideringthat E ven doesnothold.4. Heterogeneousproportions-basedclassifiers
Intheprevioussection,wehavedefinedtwonewindexesthatevaluatetheoddness orevenness ofanitemwithrespect toa multiset S of afixed size(especially S is a tripleincaseofevenness). Inthe contextofclassification, ouraimisto maintainhomogeneityorevennessinsideeachclass
C
andtoavoidoddnesswhenclassifyinganewitem.Forthispurpose, wefirstextendO dd and E ven indexestodealwithvectorsinsteadofsimpleBooleanornumericalvalues,andthen,build upaglobaloddness/evennessmeasureofanitemx w.r.t.aclassC
.Inthefollowing,weproposeafamilyofclassifiersbasedontheseglobalevenness/oddnessmeasuresandindexedbythe sizeofthesubsets(usedinthecomparisonprocessfortheoddness-basedclassifiers).ButwefirstrecalltheBayesianview oftheconformityofanitemw.r.t.aclass,whichcontrastswiththeviewsproposedintherestofthissection.
4.1. TheBayesianviewpoint
Inaclassical Bayesianview,wehave P rob
(C|
x)
=
1Z·
P rob(C)
·
ni=1P rob(
xi|
C)
assumingthat then featuresare inde-pendent.Theevidence Z dependsonlyonx, P rob(
C)
reflectssomecharacteristicsofC
suchasitssize,andni=1P rob(
xi|
C)
evaluatestheconformityofx= (
x1,
· · · ,
xn)
withC.
Undersomeconditionalindependenceassumptions,thisprobabilitycan berewrittenasaweightedproductofP rob(
xi|
C),
i.e.theconditionalprobabilitytogetvalue xiforfeaturei intheclassC
. Usually, P rob(
xi|
C)
isestimatedasthefrequencyofelementshavingvalue xi forfeaturei inthewholeclassC
.Thus,the expressionof P rob(C|
x)
involvestheproductof P rob(C)
withakindofconjunctivecombinationexpressedastheproduct oftheproportionofelementsofC
identicaltox foreachfeaturei.A counterpartofthisevaluationexistsinthesettingof possibilitytheory[2].InthecaseofBooleanfeatures,let p(
C,
i)
betheproportionofthemajorityvalueforfeaturei inC
. Then,anelementaryestimationof P rob(
C|
x)
is:1 Z
·
P rob(
C
)
·
i∈Mp(
C
,
i)
·
j∈M(
1−
p(
C
,
j))
whereM
⊆ {
1,· · ·,
n}
isthesubsetoffeatureswherex isconformtothemajorityinC
,andM isthecomplementarysubset wherex is notconform tothemajority. Theideaofconformityin thisapproach isthusrelatedtothenotion ofmajority w.r.t. thewhole setC
itself.Inthefollowing,weinvestigatetheideaofjudgingconformityw.r.t. acollectionofsmallersubsetsS ⊂ C
,andthentocumulatetheresultsofthecomparisonofx withthedifferentsubsetsS
.4.2. Classificationindexes
When it comes to real life application, it is not enough to represent individualswith a single Boolean or realvalue. Generally,individualsareencodedbyasetoffeatures.Basedonthepreviouslydefinedoddnessandevennessmeasures,we havetodefinenewmeasuressuitableforvectors.
4.2.1. Oddnessandevennessmeasuresforvectors
Whendealing withvectors
→
x∈ [
0,1]
n,Booleanvectorsarealsocoveredasaparticularcase.The O dd and E venmea-sures,definedrespectivelyby(1)and(4),areusedtoestimatetowhatextentavalue x canbe consideredasodd oreven
amonga multiset S of values.Thanks tothetwo latterformulas,assuming the independenceoffeatures,it isnaturalto computetheoddness orevenness ofavector
→
x asthesum oftheoddness orevenness foreachfeaturexi∈ →
x ,asfollows:O dd
(
S
, →
x)
=
de fin=1O dd
(
Si,
xi)
∈ [
0,
n]
E ven(
S
, →
x)
=
de fni=1E ven
(
Si,
xi)
∈ [
0,
n]
wherexi isthei-thcomponentof
→
x ,Siisthemultisetgatheringthei-thcomponentsofthevectorsinS
.Notethatinthe caseofE ven,thesetS
hasexactly3elements,butwekeepthesetnotationforsakeofnotationuniformity.Ifouraimistomeasure oddness,highvaluesof O dd
(
S,
→
x)
(closeton)meansthat,formany features,→
x appearsas an intruderandmayreducethehomogeneitywhengoing fromS
tothemultisetS ∪ {→
x}
.If O dd(
S,
→
x)
=
0,nofeature indicatesthat→
x behavesasanintruderandthereisnoobstaclefor→
x tojointhemultiset S.1Ontheopposite, ifouraimisto computeevenness ofthesubset
S
when→
x isbeingadded,the bigger E ven(S,
→
x),
thelargerthenumberoffeatures forwhich→
x conformstothe majorityinS
,thebetter→
x conformstovectorsinS
.IfE ven
(S,
→
x)
=
n,theredoesnotexista featurewhere→
x behavesasanintruder.Then, itisacceptablefor→
x to jointhe setS
.1 Itisclearthatwhendealingwithclassificationtask,Sisjustasetofexamples,withoutanyrepetition,butobviously,itsprojectionscomponentwise, theSi’saremultisetsofBooleanorrealvalues.