• Aucun résultat trouvé

Oddness/evenness-based classifiers for Boolean or numerical data

N/A
N/A
Protected

Academic year: 2021

Partager "Oddness/evenness-based classifiers for Boolean or numerical data"

Copied!
21
0
0

Texte intégral

(1)

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

Any correspondence concerning this service should be sent

to the repository administrator: tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in: http://oatao.univ-toulouse.fr/24811

To cite this version: Bounhas, Myriam and Prade, Henri and

Richard, Gilles Oddness/evenness-based classifiers for Boolean or

numerical data. (2017) International Journal of Approximate

Reasoning, 82. 81-100. ISSN 0888-613X

Official URL:

https://doi.org/10.1016/j.ijar.2016.12.002

(2)

Oddness/evenness-based

classifiers

for

Boolean

or

numerical data

Myriam

Bounhas

a

,

b

,

Henri

Prade

c

,

d

,

,

Gilles

Richard

c aLARODECLaboratory,ISGdeTunis,41ruedelaLiberté,2000Le Bardo,Tunisia

bEmiratesCollegeofTechnology,P.O.Box:41009,AbuDhabi,UnitedArabEmirates cIRIT,UPS-CNRS,118routedeNarbonne,31062ToulouseCedex,France dQCIS,UniversityofTechnology,Sydney,Australia

a

b

s

t

r

a

c

t

Keywords: Logicalproportion Classification Booleandata Numericaldata

k-Nearestneighborsmethod

In this paper, we propose two viewpoints for estimating to what extent a new item, described intermsofbinary-valued features, fits with asetof existingitems. Theyare respectively basedonan oddnessindexand an evennessindex, whichinspite oftheir names, are not exactly the oppositeof each other. Both indicators, whichreferto one feature,arebuiltfromheterogeneouslogicalproportions,andinvolvefouritems,thenew itemandthreeothers.LogicalproportionsareBooleanfunctionsthatrelatefourvariables through comparisonsbetween pairs of them.Heterogeneous onesexpress thatthere is anintruderamongfourtruthvalues,whichisforbiddentoappearinaspecificposition. Global oddnessand evennessfunctions of anitemwith respectto aset are built from thecorrespondingindexesbytakingallfeaturesintoaccount,andthenbyconsideringall triplesofitemsintheset.Moreovertheoddnessfunctionnaturallyextendstonumerical featuresandtosubsetsofitemsofdifferentsizes(pairs,triples,etc.).Simpleclassification procedures canbebasedontheseglobalfunctions: anewitemisassignedto theclass thatminimizesoddnessormaximizesevenness.Experimentsonclassicalbenchmarkswith Boolean,ornumericaldata(foroddness)showthattheresultsarecompetitivewithother classificationmethods.

1. Introduction

Ithasbeenacknowledgedforalongtimethat proportionsplayan importantroleinourperceptionandunderstanding of reality. Indeed proportions are a matter of comparisons expressed by differencesor ratios that are equated to other differencesorratios.Twocenturiesago,Gergonne[10,11]wasthefirsttoexplicitlyrelatenumerical(geometric)proportions totheideasofinterpolationandregression.

Itisonlyinthelastdecadethatanalogical proportions,i.e., statementsoftheform A isto B asC isto D,whereeach capitalletter refers to asituation described by a vectorof featurevalues, havebeenformalized firstin termsof subsets of properties that hold true in a given situation [12,26], andthen in a logical manner [16]. Quite early, it was shown

*

Correspondingauthorat:IRIT,UPS-CNRS,118routedeNarbonne,31062ToulouseCedex,France.

E-mailaddresses:Myriam_Bounhas@yahoo.fr(M. Bounhas),prade@irit.fr(H. Prade),richard@irit.fr(G. Richard).

(3)

that a formal view of analogical proportions maybe the basis of a new type of classifier that performs well on some difficult benchmarks [1,15].This was confirmed by other implementations directly based on a logicalview ofanalogical proportions[3].

Besides, itwas shownthat analogical proportionsbelong toa larger familyofso-calledlogicalproportions thatrelate a 4-tupleofBooleanvariables[17],wherethe8code-independentlogicalproportionsareofparticularinterestsincetheir truth status remain unchangedif a property is encoded positively or negatively.These 8 logical proportions divide into 4 homogeneous proportions,which includethe analogical proportionand3 relatedproportions,and4 heterogeneous

pro-portions [20].A heterogeneous proportionexpresses the idea that thereis an intruder amongthe 4truth values, which is forbiddentoappearina specificposition.Intuitivelyspeaking,an item properlyassignedtoa classshouldnot be (too much)anintruderinthisclass.Itsuggeststhatheterogeneousproportionsmaybealsoofinterestasabasisfordesigning anewtypeofclassifier.Thisisthetopicofthepaper.

It isa commonsenseprincipletoconsiderthataclass cannotbereasonablyassignedtoanewitem ifthisitemwould appear to be at odds with respectto the known members of theclass. On the contrary, theitem should be even with respect to theseclass members forentering the class. The useof proportions leads tothe ideaof considering triples of elements ina class asa basis forestimating theevenness orthe oddness ofa new item withrespect tothe class. This departs from theusual view wherethe estimationof the(non) agreementof anewitem withrespect toa set ofitems amountstocomparetheitem,featurebyfeature,withadistributionofthefeaturevaluesinthewholeset.

Anoddnessindexandanevennessindex,whicharenottheexactoppositeofeachother,areproposed.Intheevenness view,triplesaretheonlysubsetswherewhenthenewitemconformswiththeminorityforagivenBooleanfeature,there isnolongeranymajority(withrespecttothisfeature)inthetripleaugmentedwiththenewitem.Thenonecanestimate towhatextentanewitemfitswiththemajorityofelementsinanytripleofmembersofaclassonasetoffeatures.

However, it isunclear howto extendtheevenness-based approachto numericaldata.A slightly differentview,based onthedirectestimationofoddness,whichcanstillberelatedtoheterogeneouslogicalproportions,canbealsoconsidered. Thisleadstoanoddnessmeasurethatcanbeextendedtonumericalfeaturesinastraightforwardmanner,andthatcanbe alsogeneralizedtosubsetsofanysizeandnotonlyontriples.Thusinthispaper,theevaluationoftheevennessorofthe oddness ofanitem withrespecttoa classreliesonalocal view,wherethe newitem, shouldappeareven/notappearat oddswithamaximumnumberof(small)subsetsofaconsideredclass.

The paperisorganizedasfollows.Thenext sectionprovides thenecessarybackgroundonBooleanlogicalproportions, introducingthetwotypesofproportions:thehomogeneousonesandtheheterogeneousones,byespeciallyemphasizinga code independencyproperty.Results areestablished that singleout theseproportionsintermsofparticular featuresthat are meaningfulwhenitcomesto classification.Then,basedonheterogeneousproportions,oddnessandevennessindexes are introducedinSection 3,andtheirexact relationshipestablished.The extensionofoddness andevenness indexestoa numericalfeatureisdiscussed.Section4describesheterogeneousproportions-basedclassifiers.Weshowhowtheproposed oddness andevennessindexescanprovidea basisforestimatingtheoddnessortheevenness ofanitem withrespectto a wholeclass. Therelatedwork Section5 providesa briefoverviewofanalogicalproportions-basedclassifiers, which are based on ahomogeneous logical proportion, butwork quite differentlyfromheterogeneous proportions-basedclassifiers. Section 6isdevotedtoa setofexperimentsonstandardbenchmarkscomingfromtheUCIrepository.Theyarecompared both with classical classifiersandwithanalogical proportions-basedclassifiers. Finally, we provide some hintsfor future worksandconcludingremarksinSection7.

Thispapergathersandsubstantiallyextendsapproachesandresultspartiallyreportedinthreeconferencepapers[5–7].

2. Heterogeneousproportionsvs.homogeneousproportions

Logical proportionsare thebasicingredientsofourapproach.TheseproportionsareBooleanformulasinvolving4 vari-ables.Theyhavebeendeeplyinvestigatedin[19].Inthefollowingsection,wefirstrecallhowtheyarebuilt,thenwefocus ontheproportionsthatwewilluseinthispaper.

Asfornotations,thepaperusesthestandardonesforBooleanconnectives,namely

∨,

∧,

≡,

fordisjunction, conjunc-tion,equivalenceandmaterialimplicationrespectively.

2.1. Backgroundonlogicalproportions

Alogicalproportionstatesarelationbetween4itemsthatisexpressedintermsofcomparisonsbetweenpairsofitems, eachitembeingrepresentedasasetofBooleanfeatures.Considering2Booleanvariablesa andb correspondingtothesame feature attachedto2items A and B,a

b anda

b indicatethat A and B behavesimilarlyw.r.t.thegivenfeature(they are called“similarity”indicators),a

b anda

b thefactthat A and B behave differently(they arecalled“dissimilarity” indicators).Whenwehave4items A

,

B

,

C

,

D,forcomparingtheirrespectivebehaviorinapairwisemanner,weareledto considerlogicalequivalencesbetweensimilarity,ordissimilarityindicators,suchasa

b

c

d for instance.Thisenables ustodefinealogicalproportion[19]:

Definition1.A logicalproportion T

(

a

,

b

,

c

,

d

)

isthe conjunctionoftwo equivalencesbetweenindicatorsfor

(

a

,

b

)

on one sideandindicatorsfor

(

c

,

d

)

ontheotherside.

(4)

Forinstance,

((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

is alogicalproportion. It hasbeenestablished thatthere are120 syntacticallyandsemanticallydistinctlogicalequivalences.Therearetwowaysfordistinguishingremarkablesubsetsamong the120proportions:eitherbyinvestigatingtheirstructure,orbyinvestigatingtheirsemantics(i.e.theirtruthtable).Inthis section, weshallsee thatboth investigationslead tothe sameconclusion:there isa classof8proportions whichstands outofthecrowd.Thisclasscanbesubdividedinto2sub-groupsof4proportions.

Indeed a propertythat appears tobe paramountin manyreasoning tasks is codeindependency: there should be no distinction when encoding information positively or negatively. In other words, encoding truth (resp. falsity) with 1 or with 0(resp.with0and 1)isjustamatterofconvention,andshouldnotimpactthefinalresult.Whendealingwithlogical proportions,thispropertyiscalledcodeindependency andcanbeexpressedas

T

(

a

,

b

,

c

,

d

)

T

(

a

,

b

,

c

,

d

)

Fromastructuralviewpoint,rememberthataproportionisbuiltupwithapairofequivalencesbetweenindicatorschosen among16equivalences. So,toensure codeindependency,the onlywaytoproceed istofirst choosean equivalencethen topairitwithitscounterpartwhereeveryliteralisnegated:forinstancea

b

c

d shouldbepairedwitha

b

c

d

inordertogetacodeindependentproportion.Thissimplereasoningshowsthatwehaveonly16/2

=

8 codeindependent proportionswhoselogicalexpressionsaregivenbelow.

A

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

R

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

P

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

I

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

H1

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

H2

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

H3

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

H4

: ((

a

b

)

≡ (

c

d

))

∧ ((

a

b

)

≡ (

c

d

))

Only4amongtheseproportionsmakeuseofsimilarityanddissimilarityindicatorswithoutmixingthesetypesofindicators insideone equivalence:forthisreason, these4proportions A

,

R

,

P

,

I are calledhomogeneousproportions.Forinstance,an informal readingof A would be:“a differs fromb asc differsfromd and viceversa.” Thisexpresses themeaning ofan analogical proportion, i.e., a statement ofthe form“a is to b asc isto d”. Wecan consider A as aBoolean counterpart tothe ideaofnumericalproportion, eithergeometric,i.e., ab

=

dc,orarithmetic a

b

=

c

d.It mayalsobeviewedasa qualitativeformofcomparisonofdifferences,reminiscenttotheconceptofderivative wherewestudytheratio f(aa)bf(b) with f

(

a

)

=

c and f

(

b

)

=

d,whichisclosetonumericalproportions.

Theideaofaproportionsuggeststhatsome stabilitypropertiesholdw.r.t.permutations.Indeed,wecanpermute vari-ables and check, for instance,if a given proportion still holds when permuting the 2 first variables. We denote pi j the permutationofvariableinpositioni withvariableinposition j.Forinstance,p14 permutesthevariablesinextreme posi-tions1 and4,while p23permutesvariablesinmeanpositions.And p12

(

a

)

=

b

,

p12

(

b

)

=

a

,

p12

(

c

)

=

c

,

p12

(

d

)

=

d.

Definition2.AproportionT isstablew.r.t.permutationpi j iff T

(

a

,

b

,

c

,

d

)

T

(

pi j

(

a

),

pi j

(

b

),

pi j

(

c

),

pi j

(

d

))

Itcan becheckedthat A isstablew.r.t.theextremes p14 orthemeans p23 permutations. P isstableforp12and p34 permutations,while R is stablefor p13 andp24.Moreover A

,

R

,

P

,

I are symmetrical(i.e. T

(

a

,

b

,

c

,

d

)

T

(

c

,

d

,

a

,

b

)).

This isobservableontheirtruthtables:SeethetoppartofTable 1,whereonlythe6patternsthatmakethelogicalproportions true appear). Besides, I is the only logical proportion that is stable w.r.t. any permutation of two ofits variables. This noticeableresultisprovedin[18].

Moreover, R andP arecloselyrelatedto A viapermutations.Namelywehave

A

(

a

,

b

,

c

,

d

)

P

(

c

,

b

,

a

,

d

)

R

(

b

,

a

,

c

,

d

)

Infact,whend isfixed,exchangingthevariablesa

,

b

,

c amountstomovefromonehomogeneousproportiontoanother, orto remainstable( A(a

,

b

,

c

,

d

)

=

A

(

a

,

c

,

b

,

d

);

P

(

c

,

b

,

a

,

d

)

=

P

(

b

,

c

,

a

,

d

);

R

(

b

,

a

,

c

,

d

)

=

R

(

c

,

a

,

b

,

d

)),

with I remainingan exception. Thus A

,

R

,

P collectively maintain a formof exchangeability property withrespect to a

,

b

,

c, while I ensures

it byitself. Theseexchangeabilitypropertiesare ofparticular interestwhen applyinghomogeneous logicalproportionsto classification.

The 4 remaining code independent logical proportions H1

,

H2

,

H3

,

H4 are calledheterogeneousproportions: it is clear fromtheirlogicalexpressionthattheymixsimilarityanddissimilarityindicatorsinsideeachequivalence.Theirtruthtables

(5)

Table 1

Homogeneous/heterogeneousproportionsvalidpatterns.

A R P I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 H1 H2 H3 H4 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0

areshowninthebottompartofTable 1,whereonlythe6patternsthatmakethemtrueappear.Theindexi inHirefersto apositioninsidetheformulaHi

(

a

,

b

,

c

,

d

).

Namely,ascanbechecked,ineachofthese6patternsthereisaminorityvalue (i.e.,thevaluehavingthesmallestnumberofoccurrencesinthepattern,thenthisvalueislikeanintruderamongtheother values), andi is theonlypositionwherethe minorityvalue neverappears amongthe64-tuplesofvaluesthat make Hi true.

ByexaminingthetruthtableoftheheterogeneousproportionsinTable 1,wegetthefollowingproperties:

A

(

a

,

b

,

c

,

d

)

R

(

a

,

b

,

c

,

d

)

P

(

a

,

b

,

c

,

d

)

I

(

a

,

b

,

c

,

d

)

= ⊥

togetherwith

(

A

(

a

,

b

,

c

,

d

)

R

(

a

,

b

,

c

,

d

)

P

(

a

,

b

,

c

,

d

))

Eq

(

a

,

b

,

c

,

d

)

where Eq

(

a

,

b

,

c

,

d

)

=

1 ifa

=

b

=

c

=

d and Eq

(

a

,

b

,

c

,

d

)

=

0 otherwise.Similarly,forheterogeneousproportions,wehave

H1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

H3

(

a

,

b

,

c

,

d

)

H4

(

a

,

b

,

c

,

d

)

= ⊥

whichimplies:

(

H1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

H3

(

a

,

b

,

c

,

d

))

→ ¬

H4

(

a

,

b

,

c

,

d

)

(1) Obviously, we have similar properties by permuting the indexes of the Hi’s. The meaning of the conjunction

H1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

H3

(

a

,

b

,

c

,

d

)

will be discussed in the following section, devoted to heterogeneous propor-tions.

2.2. Specificityofheterogeneousproportions

In order to get a clear understanding of the heterogeneous proportions and to extract relevant properties, we now investigatetheirtruthtables.

2.2.1. Heterogeneityandexchangeability

StillwithinTable 1,anobvioussemanticsappears:Hiholdswhenthereareexactly3parameterswithidenticalBoolean values(

=

1 forexample)andtheparameterinpositioni isoneoftheseidenticalvalues.

Definition3.Given4 Boolean values a

,

b

,

c

,

d in thisorder such that 3 of them are identical and the remaining one is different,thepositioni

∈ [

1,4

]

ofthisremainingvalueiscalledtheintruderposition ortheintruderforbrevity.

Then, Hiholdsiffthereisanintruderamongthe4valuesa

,

b

,

c

,

d andtheintruderpositionisnoti.Thissuggeststhat

Hishouldbestablew.r.t.thepermutationswhichdonotaffectpositioni.Infactalittlebitmorecanbeestablished:

Property1. ApartfromI,Hiaretheonlylogicalproportionsstablew.r.tanypermutationwhichdoesnotaffectpositioni.

ThespecialcaseofI stablew.r.t.anypermutationhasbeenalreadyprovedin[19].Table 1allowstocheckthattheHi’s arestablew.r.t.thepermutationswhichdonotaffectpositioni.Showingthattheyaretheonlyonesamongthe120logical propertiesstablew.r.t.thesepermutationsrequiresatediouscheckingprocedurethatcannotbesummarizedhere.

Property 1isquitesatisfactoryandconfirmstheinformalsemanticsofHi.Thiswillbeusefulwhenusingheterogeneous proportionstoclassification.Moreimportantly,thisgivesaclearsemanticstotheconjunctionH1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

(6)

H3

(

a

,

b

,

c

,

d

)

above: H1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

H3

(

a

,

b

,

c

,

d

)

holdsiffamongthe4valuesa

,

b

,

c

,

d,thereisan intruder andthisintruderisd.Thiswillbethebasisofouroddnessmeasure.Inthenextsubsection,weestablishsomeresultsabout theparityofthenumberof1or0intruthtablesforheterogeneousproportions,whicharecontrastedwithhomogeneous proportions.Thisleadstoamodelofoddnessofagivenvalue,amongamultisetof4values.

2.2.2. Parityofthenumberof1or0intables

SincelogicalproportionsareBooleanformulasinvolving4variables,theirtruthtableshave16rows,whereonly6lead to1(see[19]foracompleteinvestigation).Onecouldaskifanytruthtablehaving6linesleadingto1and10linesleading to0corresponds toalogicalproportion.A simplenumberingargumentshowsthatthisisnotthecase.Ontopofthat,we canbuildclassesofpatternswhichcannotbevalidforanyproportion:

Property2. Thereisnologicalproportionthatistrueforthefourelementsofthesetofvaluations

{

0111,1011,1101,1110

}

.The sameholdsfor

{

1000,0100,0010,0001

}

.

Proof. Anequivalencebetweenindicatorsisoftheforml1

l2

l3

l4.Ifthisequivalenceisvalidfor

{

0111,1011

}

,itmeans that itstruth value doesnot changewhenwe switch thetruth value ofthe2 firstliteralsfrom0 to1: thereare only2 indicatorsfora andb satisfyingthisrequirement:a

b anda

b.Ifthisequivalenceisstillvalidfor

{

1101,1110

}

,itstruth valuedoesnotchangewhenweswitchthetruthvalueofthe2lastliteralsfrom0 to1:thereareonly2indicatorsforc and d satisfyingthisrequirement:c

d andc

d.Thentheequivalencel1

l2

l3

l4isjusta

b

c

d,a

b

c

d,a

b

c

d ora

b

c

d.Noneoftheseequivalencesistrueforthefourelementsofthesetofvaluations

{

0111,1011,1101,1110

}

. Thesamereasoningisstillapplicablefortheotherclass.

2

Applying a similar reasoning, we can build other set of valuations which cannot make simultaneously true a logical proportion.

Property3. Alogicalproportioncannotbemadetruebyasetof4valuationsincluding3valuationsofoneoftheclassesappearingin

Property 2andwherethe4thvaluationisjustthenegationcomponentwiseoftheremainingvaluationoftheclass.

Forinstance,thereisnologicalproportiontruefor

{

0111,1011,1101,0001

}

orfor

{

0111,0100,1101,1110

}

. Thisremarkhelpsestablishingthefollowingresult:

Property4. Heterogeneousproportionsaretheonlyproportionswhoseallvalidpatternshaveanoddnumberof1.

Proof. Fromthetruthtables, weobservethatonlyvalidpatternsforheterogeneousproportionshaveanoddnumberof1. Letusnowconsideraproportionwhose6validpatternscarry anodd numberof1. Asthereare exactly8patternswith anoddnumberof1,andthankstothepreviousproperty,thisproportionincludesnecessarily3patternsfromeach ofthe previousclasses.Ifthevalidpatternsinoneclassareobtainedfromthevalidpatternsfromtheotherclassjustbynegating all the variables,the proportionis codeindependent andthen, it isa heterogeneous proportion. Inthe opposite case, it means that we have atleast one pattern inthe first class withno negated counterpart in the other class: for instance, 1110,1101,1011 arevalidbut0001 isnota validpattern, leavingonly1000,0100,0010 tocompletethetruth tableofa logicalproportion.ThenProperty 3tellsthatthereisnoproportionvalidfor1110,1101,1011,1000.

2

Asimilarpropertyholdsforhomogeneousproportions:

Property5. Homogeneousproportionsaretheonlyproportionswhoseallvalidpatternshaveanevennumberof1.

Wenowshowthespecificityofheterogeneous(andhomogeneous)proportionsfromareasoningpointofview.

2.2.3. Inferenceandunivocalproportions

Thereisawaytoinferunknownpropertiesofapartiallyknownobject D startingfromtheknowledge wehaveabout itsotherspecifiedproperties,andassumingthatalogicalproportionT holdscomponentwisewiththreeotherobjectsA, B, C ,alsorepresentedintermsofthesamen Booleanfeatures.Thiscanbedoneviaaninductionprinciplethatcanbestated asfollows(where J isasubsetof

[

1,n

]

,andxidenotesthetruthvalueoffeaturei forobjectX

∈ {

A

,

B

,

C

,

D

}

):

i

∈ [

1

,

n

] \

J

,

T

(

ai

,

bi

,

ci

,

di

)

i

J

,

T

(

ai

,

bi

,

ci

,

di

)

Thiscan be seen asa continuityprinciple assuming that ifit isknown thata proportionholds forsome attributes, this proportionshouldstillholdfortheotherattributes.Itgeneralizestheinferenceprincipleusedwiththeanalogicalproportion

(7)

no guaranteethatthe conclusionholdswhen thepremisseshold. Nevertheless,speciallywhentheratio |nJ| is closeto 1, which meansthat proportionsholdon alarge numberofattributes, itis naturalto considerthat such aproportionmay alsoholdonthesmallnumberofremainingattributes.

This principle requires the unicity of the solution of equation T

(

a

,

b

,

c

,

x

)

=

1 where x is unknown, when it exists. Namely,given3Booleanvaluesa

,

b

,

c,wewanttodetermineforwhatlogicalproportionT theequation T

(

a

,

b

,

c

,

x

)

=

1 is solvable,andinsuchacase,ifthesolutionisunique.

Definition4.If, whentheequation T

(

a

,

b

,

c

,

x

)

=

1 issolvable, thesolutionisunique, thentheproportionT is saidto be 4-univocal. Inasimilarmanner,onemaydefineproportionsthatare1,2,or3-univocal.T isunivocal whenitisi-univocal

foreveryi

∈ [

1,4

]

.

Firstofall,itiseasytoseethattherearealwayscaseswheretheequationT

(

a

,

b

,

c

,

x

)

=

1 hasnosolution,whateverthe proportionT .Indeed,thetriplea

,

b

,

c maytake23

=

8 values,whileanyproportionT istrueonlyfor6distinctvaluations, leavingatleast2caseswithnosolution.Forinstance,whenwedealwithH4,theequationsH4

(0,

0,0,x

)

andH4

(1,

1,1,x

)

havenosolution.

Wehavethefollowingresult:

Property6. Thehomogeneousandtheheterogeneousproportionsaretheonlyproportionswhichareunivocal.

Proof. From thetruthtables, weseethat the2typesofproportionssatisfytheproperty. Now,a proportionwhich isnot

i-univocalis necessarilyvalidboth forapatternwithan odd numberof1,andfora patternwithan evennumber of1. Then,Properties 4 and5excludehomogeneousandheterogeneousproportions.

2

Atthisstep,weseethatheterogeneousproportionsallowtosingleoutaparticularpositionamonganorderedlistof4 values.Thispositiontargetsthevaluewhichisdefinitelynotanintruderamongthemultisetof4items.Forinstance,when

Hiisvalid,thevalueinpositioni isnotanintruder.Weshallseeinthefollowingsectionthatthispropertycanbeusedto checktheoddnessofagivenitemw.r.t.amultisetofelements.

3. Evaluatingthepresenceortheabsenceofanintruderinamultiset

Based onheterogeneousproportions,we definean oddnessindexandthenan evennessindex, beginningineach case with Booleanvalues, beforeinvestigating their extension to dealwith gradedtruth values.We also laybare the relation betweenevennessandoddness.

TheseoddnessandevennessindexespertaintoaBoolean,oranumericalvalue(denoted x ord inthefollowing)w.r.t. a multiset S ofsuch values.Forreasonsexplainedinthissection,it willappeardesirableto keepthismultisetsmall(i.e.,

|

S

|

=

3, ormaybe2intheoddnesscase).Moreover, inthissectionthe valuesx orin S maybe thoughtofasthevalues ofthesamefeaturefordifferentitems.InSection4,weshallbuildoddnessandevennessmeasuresfromtheseindexesby cumulatingthemoverfeatures,andbyconsideringcollectionsofmultisetsS withintheexamplesdescribingthesameclass

C

inatrainingset.

3.1. Oddnessindex

The ideaofoddness introducedbelowdirectlyreliesontheevaluationoftheextenttowhichanewitemtobeadded toasubsetreinforcesitsheterogeneityandsoappearsasanintruder init.

3.1.1. AnoddnessmeasureforBooleandata

Inthefollowing,wefirstdefinean indextoevaluatethe oddness ofanewcomer w.r.t.amultisetofBooleanvaluesvia theheterogeneousproportions,wherethemultisetisatriple.Then,weextendthisindextomulti-valuedlogicinthenext subsection,beforegeneralizingtheextendedoddnessindextomultisetsofanysize.

Let usrememberthe meaning of Hi: Hi holds iffthere isan intruder amonga

,

b

,

c

,

d and the parameterin position

i isnot thisintruder.As showninTable 2,each proportion Hi providesa pieceof knowledgeonthe intruderandwhen combined withother pieces, wecan pickout whichoneis theintruderamonga

,

b

,

c and d.Forexample H1

(

a

,

b

,

c

,

d

)

=

H2

(

a

,

b

,

c

,

d

)

=

H3

(

a

,

b

,

c

,

d

)

=

1 meansthatthereisanintruderwhichisoutofthemultiset

{

a

,

b

,

c

}

. Thenwedefinetheoddnessofd w.r.t.

{

a

,

b

,

c

}

bythefollowingformula:

O dd

(

{

a

,

b

,

c

},

d

)

=

de f H1

(

a

,

b

,

c

,

d

)

H2

(

a

,

b

,

c

,

d

)

H3

(

a

,

b

,

c

,

d

)

(2) Asanimmediateconsequenceofequation(1),wehave:

(8)

Table 2

H1,H2,H3andO dd truthvalues.

a b c d H1 H2 H3 Odd 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0

DuetothepermutationpropertiesoftheHi’s,therighthandsideofthisdefinitionisstablew.r.t.anypermutationofa

,

b

,

c, thenthemultisetnotationonthelefthandsideisjustified.Thetruthtableof O dd isgiveninTable 2.

Itisclearthat O dd holdsonlywhenthevalueofd isseenasodd amongtheothervalues:d istheintruder.Moreover

O dd does not hold inthe opposite situation where there is a majority amongvalues in a

,

b

,

c

,

d and d belongs to this majority(e.g. O dd

(

{

0,1,0

},

0)

=

0),orthereisnomajorityatall(e.g. O dd

(

{

0,1,1

},

0)

=

0).

AsimpleobservationofTable 2showsthattheoddnessindexcanberewrittenas

O dd

(

{

a

,

b

,

c

},

d

)

≡ ((

a

b

c

)

d

)

∧ (

a

b

c

)

d

))

(3) Givenamultiseta

,

b

,

c of3identicalBooleanvalues, O dd

(

{

a

,

b

,

c

},

d

)

canthenactasaflagindicatingifthe4thvalued

isdifferentfromthecommonvalueofa

,

b

,

c.Thenthevalued isatoddsw.r.t.theothervalues.

3.1.2. Extensiontonumericaldata

It is possible to extend the previous oddness measure in order to handlevariables with graded values (i.e. variables whosevaluesbelongto

[

0,1

]

,afterasuitablenormalizationofnumericaldata).Fornow,thisoddnessisjust0 or1 (i.e.the truthvalue of O dd

(

{

a

,

b

,

c

},

d

)),

butwe wouldliketoconsidertuplessuch as

(0.1,

0.2,0.1,0.8)andstillconsiderthatthe 4thvalueissomewhatoddw.r.t.the3otherones.

Adirecttranslationofformula(2),takingmin for

,max for

,and1

− |

· −

· |

for

asinŁukasiewiczlogic[23],leads to:

O dd

(

{

a

,

b

,

c

},

d

)

=

de f min

(

|

max

(

a

,

b

,

c

)

d

|, |

min

(

a

,

b

,

c

)

d

|)

(4) Firstofall,itisaneasygametocheckthat O dd remainscodeindependentwithgradedvalues,i.e.changingvaluesinto theircomplementto1.

Letusexamine some examples toget aprecise understanding ofthe formulafornumericaldata andtocheck ifthis

oddness measurefitswiththeintuition.

We see that O dd

(

{

u

,

u

,

u

},

v

)

= |

u

v

|

.Indeed, ifu

=

v then obviouslythe 4thvalue isnot an intruder. The larger

|

u

v

|

,themorev isatoddsw.r.tthe3valuesequaltou.

Weseealsothat O dd

(

{

v

,

u

,

u

},

v

)

=

0 whichisconsistentwiththeexpectedsemanticsof O dd.

Generally,O dd

(

{

u

,

v

,

w

},

max(u

,

v

,

w

))

=

O dd

(

{

u

,

v

,

w

},

min(u

,

v

,

w

))

=

0,andinanycase, O dd

(

{

u

,

v

,

w

},

u

)

0.5.

Letusnowconsider anumericalsituation with4differentnumericalvalues,forinstance:

(

{

0,0.1,0.2

},

0.9). Wefeel that d

=

0.9 appears asan intruder in the multiset

(

{

0,0.1,0.2

})

. This is consistent with the obtained truth value

O dd

(

{

0,0.1,0.2

},

0.9)

=

0.7.Moreover, O dd

(

{

0,0.1,0.1

},

0.9)

=

0.8 andO dd

(

{

0,0.1,0.3

},

0.9)

=

0.6,whichfitswiththe intuition.

Conversely,thepattern

(

{

0.7,1,1

},

0.9)doesnotstronglysuggest0.9 asanintrudervalue.IndeedO dd

(

{

0.7,1,1

},

0.9)

=

0.1. O dd

(

{

0.9,1,1

},

0.7)

=

0.2 isabithigher,asexpectedsincewehavemovedtowardsmoreuniformityamonga

,

b

,

c

andslightlyincreasedthedifferencesbetweend andtheelementsof

{

a

,

b

,

c

}

.Moreover,notethatO dd

(

{

0.7,1,1

},

0.9)

=

0.1, and O dd

(

{

0,0,1

},

0.9)

=

0.1, since the two cases illustrate two different ways of not being really an intruder. Indeed, although 0.9 is close to the majority value in

{

0.7,1,1

}

in the first case, and far from the majority value in

{

0,0,1

}

in thesecond case, closeness to majorityvalue in

{

a

,

b

,

c

}

isnot atall what O dd estimates.Ratherit is expectedtofindsimilarestimatesinthetwoabovecases,sincetheyarerespectivelycloseto O dd

(

{

1,1,1

},

1)

=

0 and to O dd

(

{

0,0,1

},

1)

=

0 asshowninTable 2.

Finally, O dd

(

{

a

,

b

,

c

},

d

)

does not behave as

|

d

average

(

{

a

,

b

,

c

})|

: the cases

{

a

,

b

,

c

}

= {

0.5,0.5,0.5

}

, d

=

0.5, and

(9)

defini-tion, O dd

(

{

0.5,0.5,0.5

},

0.5)

=

0, while O dd

(

{

0,0.5,1

},

0.5)

=

0.5. Thus O dd

(

{

a

,

b

,

c

},

d

)

is a more accurate oddness measurethan

|

d

average

(

{

a

,

b

,

c

})|

when theset

{

a

,

b

,

c

}

containsheterogeneousvalues.

From the previous examples,we understandthat theproposed definitionfits withtheinitial intuitionandprovides high truth valueswhend appearstobe atodds w.r.t.themultiset

{

a

,

b

,

c

}

andlowtruth valuesintheopposite casewhered

isnot verydifferentfromtheother values.Ontopofthat,theexpression ofO dd givenhereisnolongertheconjunction ofthemultiple-valuedextensions ofH1

,

H2

,

H3asgivenin[20],whichwouldleadtoalesssatisfactorymeasureofoddness. Indeed, we are here interested in the oddness ofd w.r.t. a multiset

{

a

,

b

,

c

}

, andnot in picking out an intruder in the multiset

{

a

,

b

,

c

,

d

}

asin[20].

3.1.3. Oddnesswithrespecttomultisetsofvarioussize

Since weare interested incheckingifd seemsan intruderina givenmultiset,wemayconsidermultisets ofanysize whendefining theoddnessindex.Indeed,theprevious oddnessindexisnotlimitedtomultisets

{

a

,

b

,

c

}

with3elements, andcan beeasily generalizedtoan indexofoddness O dd

(

S

,

x

)

ofan item x w.r.t.a multiset S ofvalues in

[

0,1

]

ofany size,asfollows:

O dd

(

S

,

x

)

=

de f min

(

|

max

(

S

)

x

|, |

min

(

S

)

x

|)

As canbe seen,weonlycompare x totheupperandlowervaluesin S,whichmaybe reallyconsidered asameaningful summary of S onlyifS isvery small(when wehavenoadditionalinformationaboutthedistributionofvaluesin S),i.e.

|

S

|

=

1,2,3 ormaybe 4.Clearly,thecomputationof O dd

(

S

,

x

)

reflectsthesmallestdistanceofx toanelementin S only

for

|

S

|

=

1,2.Indeed,thisisnotthe caseassoonas

|

S

|

3 ascan beseen onthefollowing example: O dd

(

{

a

,

b

,

c

},

x

)

=

O dd

(

{

0,0.5,1

},

0.5)

=

0.5,while

|

b

x

|

= |

0.5

0.5

|

=

0.

As itmaybe the caseforrealdatasets,we mayhavemissingvalues.Obviously,whenthere isa missingvalue inthe multiset S ofsizen,thenasimpleoptionistoconsiderthemultiset S ofsizen

1 andtoconsider O dd

(

S

,

x

)

=

O dd

(

S

,

x

)

where S hasnolongeranymissingvalue.

3.2. Evennessindex

Adopting a dual viewpoint, we may want to know if adding a newelement to a given subset of items keeps it as homogeneousasitis,i.e.,thenewcomerdoesnotappearasanintruderinthissubset,andratheragreeswithitsmajority. Homogeneity can be considered as a kind of evenness of the newcomer w.r.t. the existing items of the subset. In this subsection,we advocatea wayofjudgingevenness onthe basisofthe majority,ifany,insidethe triples.Contrary tothe oddness definition, whereall Hi

,

i

=

1,2,3 arerequired to define the oddnessindex, only H4 is neededfordefining an evennessindex,thusdenotedE ven4.

3.2.1. AnevennessmeasureforBooleandata

Since theidea istoagree withamajority, wenoticethat thesmallestmultisets S ofelements wheremajoritymakes senseareclearlytriples.LetconsiderthreeBooleanvaluesa,b,c inS.Then,inaBooleanworld,therearetwopossibilities, eithera

=

b

=

c,ortwoofthethreeareequal.Inbothcases,a strictmajoritytakesplace.Letm denotethemajorityvalue. Now consider the newcomer d,either d

=

m, and m remains the majorityvalue in

{

a

,

b

,

c

,

d

}

, ord

=

m, and thereisno longeranymajorityin

{

a

,

b

,

c

,

d

}

(twovaluesareequalto1andtwovaluesto0).Onlywiththefirstcase,d conformstothe majority.

Notethat ifweconsiderlarger subsets S,evenwithonly4elements ratherthan3,itbecomes possiblethatthe new-comer increasesan existing minority,withoutchanging themajority.Indeed,themajorityvalue that maybe sharedby 3 elements inthe 4-elementsmultisetwillthen remainunchangedinthe5-elements multisetresultingfromthearrival of a fifthelementwhateveritsvalue. A similarphenomenontakesplaceifwe startwithlarger subsetsS having5elements ormore.Sowearelosingadistinctivepropertyof3-elementssubsetswhichhaveadifferentmajoritybehaviordepending ifd conformsornottothemajorityinthe3-elementssubset. Thismeansthattriplesaretheonlysubsetssuchthataddingan itemthatconformstothetripleminoritydestroysthemajority.Thus,3-elementssubsetsareabletoclearlydiscriminate,among differentd thosethatconformtothemajorityofthetriple.

Theideaofmajorityjustdescribedhelpsustodefineanewevenness measureviatheheterogeneousproportions.Letus recallthesemanticsof Hi: Hi holdsiffthereisanintruderamonga

,

b

,

c

,

d andtheparameterinpositioni isnot this in-truder.Asaconsequence,Hiimpliesthatthereisamajorityofvaluesamong

(

a

,

b

,

c

,

d

)

andthevalueinpositioni conforms tothemajorityofvaluesappearingamongthe3otherpositions(i.e.themultisetofvalues

{

a

,

b

,

c

,

d

}

ismoreorlesseven). But thereverse implicationdoesnot holdsince whenthe 4parameters haveidenticalvalue,

i

∈ [

1,4

],

Hi

(

a

,

b

,

c

,

d

)

=

0. Then,tohaveaconciseBooleandefinitionfor“thereisamajorityofvaluesamongtheparametersa

,

b

,

c

,

d andthe param-eter inposition i belongsto thismajorityof values”,weneed to considerthecasewhere all thevaluesare identicalby usingthefollowingformula:

(10)

Table 3

H4,Eq andE ven4truthvalues.

H4 Eq Even4 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 1

where Eq

(

a

,

b

,

c

,

d

)

=

de f

(

a

b

)

∧ (

b

c

)

∧ (

c

d

).

Thus,with E veni we take intoaccountthe specialcasewhereallthe valuesareequal.ThetruthtableofE ven4 isgiveninTable 3.Itisclearthat E ven4 holdsonlywhenthevalueofd belongs to amajorityofthe parameter’svalues.And E ven4 doesnot holdinan opposite situationwherethere isnomajority of valuesasitisthecaseforE ven4

(0011)

orE ven4

(0110).

The situations where E ven4

(

a

,

b

,

c

,

d

)

=

1 exactly coverthe two cases already mentioned whered is identicalto the majorityvalue inthe triple

{

a

,

b

,

c

}

(is notthe intruder),namelyeither a

=

b

=

c, ortwo ofthethree areequal tod.So thefactthat d joins

{

a

,

b

,

c

}

,when E ven

(

a

,

b

,

c

,

d

)

=

1,leavestheresultingsubset aseven asitwas,hencethename,and infactthemajorityisreinforcedbythearrival ofd.Notealsothat E ven4

(

a

,

b

,

c

,

d

)

isleftunchangedbyanypermutation of

{

a

,

b

,

c

}

.Thismeans thatthe orderinginsidetriples doesnotmatter. Besides, E ven4

(

a

,

b

,

c

,

d

)

=

E ven4

(

a

,

b

,

c

,

d

)

where

x

=

1 if x

=

0 andx

=

0 ifx

=

1,expressingthat E ven4

(

a

,

b

,

c

,

d

)

doesnotdependonthewaytheinformationisencoded. Fromnowon, E ven4willbedenotedE ven asthisistheonlyoptionweusewithi

=

4.

3.2.2. RelationbetweenoddnessandevennessintheBooleancase

The oddness andevenness Boolean functionshave been built by truth tablesinspection. However, these 2 functions exhibitnoticeablelinks.Despitethefactthattheirnamemightsuggestthatoddnessandevennesscapturedualconcepts,it isnotthecasethat E ven

(

a

,

b

,

c

,

d

)

≡ ¬

O dd

(

{

a

,

b

,

c

},

d

).

Infact,therelationsbetweenthe2measuresareasfollows(where

I denotestheinverseparalogydefinedinsection2.1):

Property7.

E ven

(

a

,

b

,

c

,

d

)

≡ ¬

O dd

(

{

a

,

b

,

c

},

d

)

∧ ¬

I

(

a

,

b

,

c

,

d

)

O dd

(

{

a

,

b

,

c

},

d

)

≡ ¬

E ven

(

a

,

b

,

c

,

d

)

∧ ¬

I

(

a

,

b

,

c

,

d

)

I

(

a

,

b

,

c

,

d

)

≡ ¬

E ven

(

a

,

b

,

c

,

d

)

∧ ¬

O dd

(

{

a

,

b

,

c

},

d

)

Thiscanbe easily checkedonthetruth tables.Thisreflects thefact that O dd and E ven and I are mutuallyexclusive. Let usnote that E ven

(

a

,

b

,

c

,

d

)

→ ¬

O dd

(

{

a

,

b

,

c

},

d

),

that

¬

E ven

(

a

,

b

,

c

,

d

)

→ ¬

H4

(

a

,

b

,

c

,

d

)

andthat O dd

(

{

a

,

b

,

c

},

d

)

¬

H4

(

a

,

b

,

c

,

d

)

aswell.

Amorecompletediscussionoftherelationbetweenoddnessandevennesscanbefoundin[21].

3.2.3. Extensiontonumericaldata

Inordertodealwithgradedtruthvalues,weneedtoextendthepreviousdefinitionofEvengiveninformula(4)tothe casewherethetruthvaluesbelongto

[

0,1

]

,aswehavedonefortheoddnessfunction.

Adirecttranslationofformula(4),takingmin for

,max for

,and1

− |

· −

· |

for

asinŁukasiewicslogic,leadsto thefollowingexpressionforE ven4:

max

(

min

(

1

− |

min

(

a

,

b

)

min

(

1

c

,

d

)

|,

1

− |

min

(

1

a

,

1

b

)

min

(

c

,

1

d

)

|),

1

− |

max

(

a

,

b

,

c

,

d

)

min

(

a

,

b

,

c

,

d

)

|)

Letusexaminethebehaviorofthisdefinition.Inordertogetaclearpicture,weconsiderthe2followingcurves:

f

(

x

)

=

E ven

(0,

x

,

x

,

x

):

we wouldexpect f toget theconstant value 1,since, whateverits value,the last element x,

(11)

Fig. 1. f and g functions with standard definition of E ven.

Fig. 2. f and g functions with the new definition of E ven.

g

(

x

)

=

E ven

(0,

x

,

x

,

0): weexpect afunction decreasingfrom1 to 0 when x goesfrom0 to 1.Indeed,thesmaller x,

thecloserto 1 E ven(0,x

,

x

,

0) shouldbe,whilethelarger x themore0 appears tobe equaltotheminorityvalue in themultiset

{

0,x

,

x

}

.

The corresponding curves are inFig. 1. Black square dots are forfunction f , empty circles forfunction g (mindthat an emptycirclemaypartiallyhideablacksquarewhen f and g coincide).Ascanbeseen, f isnotaconstantfunctionandg

isnotmonotonicallydecreasing.ThiscontrastswithO dd

(0,

x

,

x

,

x

)

=

0 and O dd

(0,

x

,

x

,

0)

=

0.

It appears that a direct translationof the Booleandefinition (4) doesnot fit withthe expectedmeaning ofevenness in the caseof graded truth values. But obviously, we could start from the property E ven

≡ ¬

O dd

∧ ¬

I to get another translationas:

E ven

(

a

,

b

,

c

,

d

)

=

min

(

1

O dd

(

a

,

b

,

c

,

d

),

1

I

(

a

,

b

,

c

,

d

))

ThisnewdefinitionleadsviaaneasycomputationtoE ven

(0,

x

,

x

,

x

)

=

1

x ifx

0.5 and1

min

(

x

,

2

2x)whenx

0.5. Withthisnewdefinitionof E ven, thecurvescorresponding to f and g aregiveninFig. 2(weuse blacksquaredots for function f andemptycirclesforfunction g again).

Observethatthebehavior ofg

(

x

)

=

E ven

(0,

x

,

x

,

0)issatisfactory sincewegetthedecreasingfunction1

x.However

f

(

x

)

=

E ven

(0,

x

,

x

,

x

)

maybefarfrom1(inparticular,E ven

(0,

23

,

23

,

23

)

=

13).Suchabehaviorisnotsatisfactoryatall.Itis still anopenquestiontofindabetterdefinitionfor E ven inthegradedcase,whichwouldcoincidewiththeBooleancase whena

,

b

,

c

,

d

∈ {

0,1

}

.Forthisreason,weshallnotexperimenttheevennessfunctionwithnumericaldata.

3.2.4. Dealingwithmissingvalues

Missinginformationisquitecommoninreallifedatasetsandawaytoextendthesemanticsofanalogicalproportionto dealwiththisissuehasbeendeeplyinvestigatedin[20]forinstance.Infact,suchanapproachcanbealsoappliedhere,as explainednow.

Still keepingalogical approachandconsidering that‘?’ denotes amissingvalue (i.e.an informationisunknown),the idea is toextend thetruth table of the E ven formula asfollows: E ven

(?,

0,0,0)

=

E ven

(0,

?,0,0)

=

E ven

(0,

0,?,0)

=

1,

E ven

(?,

1,1,1)

=

E ven

(1,

?,1,1)

=

E ven

(1,

1,?,1)

=

1, and E ven

(

x

,

y

,

z

,

t

)

=

0 for any other patternincluding atleasta missing value ‘?’. It isclearthat, withthe 6 firstpatterns, whateverthe candidatevalue of themissing feature,the 4th argument belongs to the majority and cannot be an intruder. In all the remaining cases, where we have no certainty regardingthestatusofd,weadoptacautiousbehaviorbyconsideringthat E ven doesnothold.

(12)

4. Heterogeneousproportions-basedclassifiers

Intheprevioussection,wehavedefinedtwonewindexesthatevaluatetheoddness orevenness ofanitemwithrespect toa multiset S of afixed size(especially S is a tripleincaseofevenness). Inthe contextofclassification, ouraimisto maintainhomogeneityorevennessinsideeachclass

C

andtoavoidoddnesswhenclassifyinganewitem.Forthispurpose, wefirstextendO dd and E ven indexestodealwithvectorsinsteadofsimpleBooleanornumericalvalues,andthen,build upaglobaloddness/evennessmeasureofanitemx w.r.t.aclass

C

.

Inthefollowing,weproposeafamilyofclassifiersbasedontheseglobalevenness/oddnessmeasuresandindexedbythe sizeofthesubsets(usedinthecomparisonprocessfortheoddness-basedclassifiers).ButwefirstrecalltheBayesianview oftheconformityofanitemw.r.t.aclass,whichcontrastswiththeviewsproposedintherestofthissection.

4.1. TheBayesianviewpoint

Inaclassical Bayesianview,wehave P rob

(C|

x

)

=

1Z

·

P rob

(C)

·



ni=1P rob

(

xi

|

C)

assumingthat then featuresare inde-pendent.Theevidence Z dependsonlyon



x, P rob

(

C)

reflectssomecharacteristicsof

C

suchasitssize,and



ni=1P rob

(

xi

|

C)

evaluatestheconformityof



x

= (

x1

,

· · · ,

xn

)

with

C.

Undersomeconditionalindependenceassumptions,thisprobabilitycan berewrittenasaweightedproductofP rob

(

xi

|

C),

i.e.theconditionalprobabilitytogetvalue xiforfeaturei intheclass

C

. Usually, P rob

(

xi

|

C)

isestimatedasthefrequencyofelementshavingvalue xi forfeaturei inthewholeclass

C

.Thus,the expressionof P rob

(C|

x

)

involvestheproductof P rob

(C)

withakindofconjunctivecombinationexpressedastheproduct oftheproportionofelementsof

C

identicalto



x foreachfeaturei.A counterpartofthisevaluationexistsinthesettingof possibilitytheory[2].InthecaseofBooleanfeatures,let p

(

C,

i

)

betheproportionofthemajorityvalueforfeaturei in

C

. Then,anelementaryestimationof P rob

(

C|

x

)

is:

1 Z

·

P rob

(

C

)

·



iMp

(

C

,

i

)

·



jM

(

1

p

(

C

,

j

))

whereM

⊆ {

1,

· · ·,

n

}

isthesubsetoffeatureswhere



x isconformtothemajorityin

C

,andM isthecomplementarysubset where



x is notconform tothemajority. Theideaofconformityin thisapproach isthusrelatedtothenotion ofmajority w.r.t. thewhole set

C

itself.Inthefollowing,weinvestigatetheideaofjudgingconformityw.r.t. acollectionofsmallersubsets

S ⊂ C

,andthentocumulatetheresultsofthecomparisonof



x withthedifferentsubsets

S

.

4.2. Classificationindexes

When it comes to real life application, it is not enough to represent individualswith a single Boolean or realvalue. Generally,individualsareencodedbyasetoffeatures.Basedonthepreviouslydefinedoddnessandevennessmeasures,we havetodefinenewmeasuressuitableforvectors.

4.2.1. Oddnessandevennessmeasuresforvectors

Whendealing withvectors

x

∈ [

0,1

]

n,Booleanvectorsarealsocoveredasaparticularcase.The O dd and E ven

mea-sures,definedrespectivelyby(1)and(4),areusedtoestimatetowhatextentavalue x canbe consideredasodd oreven

amonga multiset S of values.Thanks tothetwo latterformulas,assuming the independenceoffeatures,it isnaturalto computetheoddness orevenness ofavector

x asthesum oftheoddness orevenness foreachfeaturexi

∈ →

x ,asfollows:

O dd

(

S

, →

x

)

=

de f



in=1O dd

(

Si

,

xi

)

∈ [

0

,

n

]

E ven

(

S

, →

x

)

=

de f



ni=1E ven

(

Si

,

xi

)

∈ [

0

,

n

]

wherexi isthei-thcomponentof

x ,Siisthemultisetgatheringthei-thcomponentsofthevectorsin

S

.Notethatinthe caseofE ven,theset

S

hasexactly3elements,butwekeepthesetnotationforsakeofnotationuniformity.

Ifouraimistomeasure oddness,highvaluesof O dd

(

S,

x

)

(closeton)meansthat,formany features,

x appearsas an intruderandmayreducethehomogeneitywhengoing from

S

tothemultiset

S ∪ {→

x

}

.If O dd

(

S,

x

)

=

0,nofeature indicatesthat

x behavesasanintruderandthereisnoobstaclefor

x tojointhemultiset S.1

Ontheopposite, ifouraimisto computeevenness ofthesubset

S

when

x isbeingadded,the bigger E ven

(S,

x

),

thelargerthenumberoffeatures forwhich

x conformstothe majorityin

S

,thebetter

x conformstovectorsin

S

.If

E ven

(S,

x

)

=

n,theredoesnotexista featurewhere

x behavesasanintruder.Then, itisacceptablefor

x to jointhe set

S

.

1 Itisclearthatwhendealingwithclassificationtask,Sisjustasetofexamples,withoutanyrepetition,butobviously,itsprojectionscomponentwise, theSi’saremultisetsofBooleanorrealvalues.

Figure

Fig. 2. f and g functions with the new definition of E ven.

Références

Documents relatifs

In this paper we present a new Boolean matrix factorization algorithm for this kind of data, which use the new knowledge from the theory of the Boolean factor analysis,

Future research issues for CBR in the cloud will be investigated, including the semantic description and retrieval of cloud services, the case-based analysis of time series

In particular, we know the exact value of both the total number of attractors and the number of attractors of period p, ∀p ∈ N , in the dynamics of positive and negative

In a more specific meaning a Boolean category should provide the ab- stract algebraic structure underlying the proofs in Boolean Logic, in the same sense as a Cartesian closed

- Kauffman’s model is a Boolean network having k quenched random connections per node, and evolving deterministically in time according to a quenched random rule.. They

To determine a methylation signature, the top 1 feature-selected gene, top 2 feature-selected genes and so forth were selected, until reaching the top 15 feature-selected genes.

To mitigate these problems, this approach is based on structured and advanced models while using for the plant the Boolean models containing rules, and for the constraints

We have already seen briefly that the not operation can be used to describe the relationship between the pairs of trigrams in the Primal arrangement, and that the Sinking