p u, valued in the same spae as the input data, is

(1)

Madalina Olteanu a

,Nathalie Villa-Vialaneix a,b

a

SAMM,EA4543, UniversitéParis1,F-75634 Paris, Frane

b

INRA,UR875, MIAT, F-31326Castanet-Tolosan,Frane

Abstrat

Insomeappliationsandinordertoaddressreal-worldsituationsbetter,data

maybemoreomplexthansimplenumerialvetors. Insomeexamples,data

an beknownonlythrough their pairwisedissimilaritiesorthrough multiple

dissimilarities, eah of them desribing a partiular feature of the data set.

Several variants of the Self Organizing Map (SOM) algorithm were intro-

dued to generalize the original algorithm to the framework of dissimilarity

data. WhereasmedianSOM isbasedonaroughrepresentation oftheproto-

types,relationalSOMallowsrepresentingtheseprototypesbyavirtuallinear

ombination of all elements in the data set, referring to a pseudo-eulidean

framework. In the presentartile,anon-lineversionof relationalSOMis in-

troduedandstudied. SimilarlytothesituationintheEulideanframework,

this on-line algorithm provides a better organization and is muh less sen-

sible toprototype initializationthan standard (bath) relationalSOM. In a

moregeneralase,thisstohastiversionallowsustointegrateanadditional

stohastigradient desentstep in the algorithmwhihan tune the respe-

tiveweightsofseveraldissimilaritiesinanoptimalway: theresultingmultiple

relationalSOM thushastheabilitytointegrateseveralsouresofdataofdif-

ferenttypes, ortomakeaonsensusbetweenseveraldissimilaritiesdesribing

the same data. The algorithmsintroduedin this manusript are tested on

several data sets, inluding ategorial data and graphs. On-line relational

SOM is urrently available in the R pakage SOMbrero that an be down-

loadedathttp://sombrero.r-forge. r-p roje t.o rg/ordiretlytestedon

its Web UserInterfae athttp://shiny.nathalievilla .org /som brer o.

Keywords: Self-OrganizingMap, Dissimilarity,Kernel, On-line,Multiple

Email addresses: madalina.olteanuuniv-paris1.fr(MadalinaOlteanu),

nathalie.villatoulouse.inra.fr(NathalieVilla-Vialaneix)

(2)

1. Introdution

In many real-world appliations, data an not always be desribed by a

xed set of numerial attributes. This is the ase, for instane, when data

are desribed by ategorial variables or by relations between objets (i.e.,

personsinvolvedinasoialnetwork). Thisissueanbeeventrikierwhenthe

dataareomposedofseveralsouresofnonhomogeneousinformation(e.g.,a

soialnetworktogetherwith attributesonthe nodes asin[1,2℄). Aommon

solutiontoaddressthis kindofissue istouseameasure ofresemblane(i.e.,

a similarity or a dissimilarity) that an handle ategorial variables, graphs

or fous on spei aspets of the data, designed by expertise knowledge

[3℄. Many standard methods for data mining have been generalized to non

vetorial data, reently inluding prototype-based lustering, even though,

in some ases, the hoie of the most relevant dissimilarityremains anopen

issue (see [4, 5℄ for a disussion on this topi in the eld of soial siene).

The reentpaper[6℄providesanoverview ofseveral methodsthat have been

proposed totakle omplex data with neural networks.

Inpartiular, several extensions of theSelf-OrganizingMap(SOM) algo-

rithmhavebeenproposed. OneapproahonsistsinextendingSOM toate-

gorialdatabyusingamethodsimilartoMultipleCorrespondene Analysis,

[7℄. Another approah uses the median priniple whih onsists in repla-

ingthe standard omputationof the prototypes by anapproximationin the

original data set. This priniple was used to extend SOM to dissimilarity

data in [8℄. One of the main drawbaks of this approah is that foring the

prototypes to be hosen among the data set is very restritive; in order to

inrease the exibilityof the representation, [9℄ proposestorepresent alass

by several prototypes, allhosen among the original data set. However this

methodinreasestheomputationaltime,whileprototypesremainrestrited

totheoriginaldatasetandmaygeneratepossiblesamplingorsparsityissues.

An alternative to median-based algorithms relies on a method that is

lose tothe standard algorithm used in the Eulidean ase. This methodis

based onthe idea that prototypes may be expressed as linear ombinations

of the original input data. In kernel SOM framework, this setting is made

natural by the use of the kernel, whih maps the original data into a (large

dimensional) Eulidean spae (see [10, 11, 12℄ for on-line versions and [13℄

(3)

data suh as strings, nodes in a graph or graphs themselves [14℄. In some

ases, the data are solely desribed by a dissimilarity matrix. [15, 16, 17℄

give neessary and suient onditions for a symmetri matrix to be a dis-

tane matrix in an Eulidean spae but, as pointed out by [3℄, the lass of

similarity/dissimilaritythatan beembedded inaEulidean spae israther

limited and doesnot aommodate ona numberof useful measures already

developed intheliterature. Inthis ase,[18,19,20,21℄propose tointrodue

an impliitonvex ombination of the original data inorder to extend the

lassial bath versions of SOM to dissimilarity data: this approah impli-

itly uses the embedding of the original data ina pseudo-eulidean spae, as

dened in[22℄.

However, bathversionsoftheSOMalgorithmareknown, atleastforthe

standardnumerialSOM[23℄,topresentseveraldrawbakssuhaspoororga-

nizationandstrongdependenyontheprototypeinitialization. Thisproblem

may be partially ountered using PCA orMDS initializations, but when no

goodinitializationisavailable,astohasti(alsoalledon-line)versionofthe

algorithmanbeverybeneial. Thepurposeofthepresentpaperistointro-

dueandjustifytheon-lineversionofrelationalSOM,asalreadyproposedin

[24℄. Suh anapproahleads toa better organizationof the map. Addition-

ally,takingadvantageofthestohastisheme,relationalSOMisextendedto

integrateseveral souresof non homogeneousinformationbyusing anadap-

tive onvex ombinationof dissimilarities. The weightsof eah dissimilarity

areupdatedduringtheSOMlearningproessbyanadditionalstohastigra-

dient desent step. In the remaining of this manusript, Setion2 desribes

the on-line extension of the relational SOM algorithm, already studied in

[24℄, while Setion 3 desribes how this approah an be used to integrate

multipleinformationomingeitherfromdierent datasets or fromdierent

dissimilarity measures. Finally, Setion 4 illustrates the approah on simu-

latedandreal-worlddata setsandomparesitwithpreviousliterature. Note

that theon-linerelationalSOM isavailableinthe R[25℄pakageSOMbrero,

whih an be downloaded on R-Forge [26℄

1

or tested on its shiny [27℄ Web

User Interfae athttp://shiny.nathalievilla .org /som brer o.

1

http://sombrero.r-forge.r-projet.org/

(4)

Let us reall that the Self-Organizing Map (SOM) algorithm aims at

mapping

n

^input ^data

x 1

^, ^.^.^.^,

x n

^into ^a ^low dimensional grid omposed of

U

^units. ^A ^prototype

p u

^, ^valued ⁱⁿ ^the ^same ^spae âs ^the înput ^data, îs

assoiatedtoeahunit

u ∈ {1, . . . , U }

ôf^the^grid. ^The^gridînduesâ^natural

distane

d

ôn ^the ^map: ^for êvery ^pair ôf ^neurons

(u, u ^′ )

^,

d (u, u ^′ )

^is ^usually

dened as the length of the shortest path between

u

^and

u ^′

^(although ^other

topologiesaresometimes used,inludingthestandardEulidean distane on

thegrid). Thealgorithmaimsatlusteringtogethersimilarobservationsand

alsoatpreserving theoriginaltopologyofthe datasetonthemap (i.e.,lose

observations are lustered into lose units on the map, distant observations

are lustered intodistant units on the map). In order todo so, an iterative

proess is performed by alternating two steps. The original algorithm for

numerial vetors may beresumed asfollows:

•

ân âssignment ^step ^where ône observation (on-line version) or all observations (bath version) is/are aeted to the losest prototype (in

the sense of the Eulidean distane):

f (x i ) = arg min

u=1,...,U kx i − p u k,

•

^arepresentationstepwhereallprototypesareupdatedaordingtothe new assignment. For the on-line version of the algorithm, this step is

performed by mimikinga stohasti gradient desent sheme:

p

^new

_u = p

^old

_u + µH (d (f (x i ), u))

x i − p

^old

_u

,

⁽¹⁾

where

H

^is ^the neighborhood funtion verifying the assumptions

H : R ⁺ → R ⁺

^,

H(0) = 1

^and

lim _x→+∞ H(x) = 0

^, ^and

µ

^is ^a ^training

parameter. Generally,

H

^and

µ

^are ^supposed^to^be^dereasing^with ^the

numberof iterationsduring the trainingproedure.

TheoriginalSOM algorithmdesribed abovedoesnotpossesaost fun-

tion and is not exatly a gradient desent, at least not in the ontinuous

ase. However, when the size of the neighborhood isxed and with a modi-

edassignmentstep,[28℄provedthatSOMisminimizingthefollowingenergy

funtion:

(5)

E ((p u ) u ) = X U u=1

Z

δ u,f(x)

X U l=1

H(d(l, u))kx − p l k ² P (dx) ,

where

δ _u,f(x) =

1,

^if

f (x) = u 0,

^otherwise ^.

2.1. SOM for dissimilarity data

Intheasewheretheinputdatatakevaluesinanarbitraryinputspae

G

^,

a natural Eulidean struture is not neessarily assoiated with

G

^. ^Instead,

thedissemblanebetweentheobservationsanbedesribedbyadissimilarity

measure

∆ = (δ ij ) i,j=1,...,n

^suh ^that

∆

^is ^non ^negative ⁽

δ ij ≥ 0

^), ^symmetri

(

δ ij = δ ji

⁾ ^and ^null ^on ^the ^diagonal ⁽

δ ii = 0

^). ^In ^this ^ase ^however, ^the

assignment step annot be arried out straightforwardly sine the distanes

between the input data and the prototypes are not bediretly omputable.

Severalextensions ofthe SOM algorithmhavebeen proposed inthis on-

text: [8℄proposesthemedianSOM wheretheprototypesarehosenamong

theinputdata

(x i ) i

ⁱⁿâ^bath^framework. ^Theâssignment^stepîs^then^similar

to the Eulidean framework, with the dissimilarity replaing the Eulidean

norm. The representation step simply nds the prototypes that minimize

the energyof themap by anexhaustivesearh amongtheinputdata. [9,29℄

extend this work by using several observations instead of a unique one for

eah prototype and by proposing a fast implementation of the algorithm.

Nevertheless, hoosing the prototypes among the input data is very restri-

tiveand using several observations foreahprototype strongly inreases the

omputationaltime needed totrain the map.

Tooveromethis diulty, the solutionproposed by [18,19,20,21℄isto

rely on the pseudo-eulidean framework: indeed, [22℄ pointed out that any

data desribed by a symmetri dissimilarity matrix an be embedded in a

spae onsisting of the orthogonal diret sum of two Eulidean spaes, for

whih the inner produt operation is denitepositiveon the rst spae and

denite negative on the seond. Relying on this framework, and similarly

to the kernel SOM approah [19℄, the prototypes are supposed to be sym-

bolionvexombinationsoftheoriginaldata(atually,onvexombinations

of their impliit embedding in the pseudo-eulidean spae):

p u ∼ P

i β ui x i

(6)

with

P

i β ui = 1

^and

β ui ≥ 0

²^. ^If

β u

^denotes ^the ^vetor

(β u1 , . . . , β un )

^, ^the

distane inthe assignment step an bewritten interms of

∆

^and

β u

^only:

δ(x i , p u ) ≡ ∆ i β u − 1

2 β _u ^T ∆ β u .

⁽²⁾

where

∆ i

^is ^the

i

^-th ^row ^of

∆

^(the ^formula ^is ^justied ^and ^proved ⁱⁿ

Appendix A). This algorithm, alled relational SOM, was proposed in the

bathframework wheretherepresentationstep onsistsinupdatingthe on-

vex ombinationby a mean alulation:

β ui = H(d(f (x i ), u)) P

i ^′ H(d(f (x i ^′ ), u)) .

This approah is very similar to the bath kernel SOM desribed in [12,

13℄. In kernel SOM, the Eulidean framework is justied by the denition

of a kernel

K : G × G → R

^that împliitly ^maps ^the ^data înto â ^Hilbert

spae where the inner produt is diretly available via the kernel. Atually,

bathkernelSOMandbathrelationalSOMareequivalentforadissimilarity

dened fromthe kernel by:

δ(x i , x j ) := K(x i , x i ) + K(x j , x j ) − 2K(x i , x j ).

⁽³⁾

Reiproally,ifthedissimilaritymatrix

∆

ân^beêmbeddedⁱⁿâÊulidean

spae (i.e., if it fullls the ondition given in [15, 16, 17℄ whih is that the

matrixwithelements

s ij = (δ(x i , x n ) ² + δ(x j , x n ) ² − δ(x i , x j ) ² ) /2

^is^positive,

or, similarly,if the matrix with elements

s(i, j ) = − 1

2 δ ² (x i , x j ) − 1 n

X n k=1

δ ² (x i , x k ) − 1 n

X n k=1

δ ² (x k , x j ) + 1 n ²

X n k,k ^′ =1

δ ² (x k , x k ^′ )

!

as proposed in [30℄, ispositive), then relationalSOM is equivalent tokernel

SOM used with the matrix

(s ij ) ij

^, ^whih, ⁱⁿ ^this âse, îsâ ^kernel. ^However,

as explained in[3℄, some useful dissimilarities (e.g., shortest path lengthsin

graphs or optimal mathing dissimilarities for sequenes of events, [31, 32℄)

2

Note that this sum has no real meaning, most of the times, as

G

^is ^not ^neessarily

equippedwitha

+

^operation^neither^with^amultipliationbyasalar. Itsimplyimpliitly referstothe

+

^operationⁱⁿ ^the^underlyingpseudo-eulideanspae: theformaldenition of

p u

îs^givenⁱⁿÂppendix Â.

(7)

Eulidean spae. In theseases, thedissimilarityan beturnedintoakernel

using various pre-proessings, as desribed in [33℄ but then, relational SOM

and kernel SOM are nolonger idential.

2.2. On-line relational SOM

Asexplainedin[23℄,althoughbathSOMpossessesthenie propertiesof

beingdeterministiandofusuallyonverginginafewiterations,ithasseveral

drawbaks suh as organizing the map rather poorly, produing unbalaned

lasses and being strongly dependent on the initialization. Hene, using

the same ideas as [18, 20℄, we introdue the on-line relational SOM, whih

generalizes the on-lineSOM tothe ase of dissimilarity data. The proposed

methodisdesribedinAlgorithm1. In thisalgorithm,onlyone observation,

Algorithm 1 On-linerelationalSOM

1: Forall

u = 1, . . . , U

^and

i = 1, . . . , n,

^initialize

β _ui ⁰

^suh ^that

β _ui ⁰ ≥ 0

^and

P n

i β _ui ⁰ = 1

^.

2: for t=1,...,T do

3: Randomly hoose aninput

x i

4: Assignment step: nd the unit of the losest prototype

f ^t (x i ) ← arg min

u=1,...,U

β _u ^t−1 ∆

i − 1

2 (β _u ^t−1 ) ^T ∆ β _u ^t−1

5: Representation step:

∀ u = 1, . . . , U

^,

β _u ^t ← β _u ^t−1 + µ(t)H ^t (d(f ^t (x i ), u)) 1 i − β _u ^t−1

where

1 ⁱ

îs â ^vetor ^with â^single ^non ^nullôeientât ^the

i

^th^position,

equal to one.

6: end for

randomlyhosen,isassignedtoaunitofthemapateahiterationstep. The

representation step isdrawn fromEquation (1) by using a similarapproah

to update the prototypes' oordinates

(β ui ) ui

^. ^Note ^that ^the ^onstraints ^on

(β ui ) ui

^are ^preserved ^sine:

• P

i β _ui ^t = 1

^(as demonstrated inAppendix B);

(8)

• β _ui ^t ≥ 0

^for ^any

u

^and

i

^as ^long ^as

µ(t)

^is ^small ^enough ⁽

µ(t)H ^t

^must

simplybesmaller than 1).

This latter ondition is easy enough to handle. In our experiments, the

parameters of the algorithmare hosen aording to[34℄: the neighborhood

H ^t

^dereases ⁱⁿ ^a ^pieewise ^linear ^way, ^starting ^from ^a neighborhood whih orresponds tothe wholegrid up toaneighborhoodrestrited totheneuron

itself;

µ(t)

^vanishes ^at ^the ^rate ^of

1/t

^.

2.3. Disussion on thealgorithm: relationsto previous algorithms, omplex-

ity and onvergene

Ifthe dissimilarity matrix is a Eulidean distane, then the on-line rela-

tionalSOMisexatlyidentialtothestandardnumerialSOMaslongasthe

prototypesof theoriginalSOM are initializedinthe onvex hullofthe origi-

naldata(i.e.,theinitialprototypesanbewritten

p ⁰ _u = P

i β _ui ⁰ x i

^). ^Similarly,

the on-linerelationalSOM isidential toon-linekernel SOMasdesribedin

[10, 11, 12℄for a dissimilarity dened from a kernel

K

^by Êquation ⁽³⁾ôr îf

the dissimilarityfulllsone of the onditions in[15,16, 17℄.

Moreover, if one wantsto generalize dissimilarities to non-symmetri re-

lations(suhas,forexample,graph-basedomparisonsofproteinngerprint

graphs),adissimilaritymatrixomputedasthehalf-sumofpairwiserelations

may beonsidered as the input for the algorithm.

Inordertoillustratethe performanesoftheon-linerelationalSOMom-

pared tothe bath implementation,500 points are onsidered, sampledran-

domlyfromthe uniformdistributionin

[0, 1] ²

^. ^Thedissimilarityisomputed as the length of the shortest path in the graph indued by the Delaunay

triangulation (this graph is displayed in Figure 1). Note that this dissim-

ilarity is not exatly equivalent to the Eulidean

R ²

^-metri, ^sine ^it ^is ^not

even Eulidean. ThebathversionofrelationalSOM andthe on-lineversion

of relational SOM were trained on idential

10 × 10

^grid ^strutures. ^The

algorithms were trained either with idential initializations, or with a PCA

initialization 3

forthe bathSOM,whihisthestandardinitializationusedto

alleviatetheinitializationdependenyofthisalgorithm. Resultsareavailable

in Figure 1 and learly show a muh better organization of the prototypes

in the nal grid provided by the on-line version of the algorithm. When the

3

Dissimilarity PCA was used and then properly re-saled to satisfy the ondition

P

i β ui = 1

^.

(9)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Figure 1: 500pointssampled fromtheuniform distributionin

[0, 1] ²

^and^their^Delaunay

graph (top left) and map organizations obtained by relational on-line SOM (top right)

andrelationalbathSOM(bottomleft)withrandominitializationandbyrelationalbath

SOMwithdissimilarity-PCAinitialization(bottom right).

prototypesareinitializedwithaPCA,the organizationof themapprodued

bythe bathkernel SOMis muhbetterbut stillslightlyworse thanthe one

obtained with the on-line version and a random initialization. This visual

eet is onrmed when alulating the topographi error [35℄: this error

quanties the ontinuity of the map with respet to the input spae metri

by ounting the number of times the seond best mathing unit of a given

observationbelongstothe diretneighborhoodofthe best mathingunitfor

this observation. A topographi error equalto 0 means that allseond best

mathing units are in the diret neighborhood of the winner neurons and

thus that the originaltopology ofthe data is well preserved onthe map. In

this simple example, it is equal to 0.01 for the on-line relational SOM, to

0.176 for the bathrelationalSOM with PCAinitializationand to0.264 for

the bath relational SOM with random initialization. Hene, the lassial

initialization dependeny of the bath version of the algorithm, as already

(10)

no goodinitializationis present (i.e., when PCA or MDS are bad initializa-

tion strategies), the on-line version an then bevery beneial. Finally, the

omplexity of the on-line and bath versions are similar (of order

O(Un ² )

⁾

withusually asmallernumberofiterationsneeded tostabilizethebathver-

sion: the onvergene of the bath version is attained with quadrati speed

while the on-line version onverges with a linear speed. However, the better

organization of the map ompensates for this small loss in omputational

time. Finally,letusremarkthat formallyspeaking,inthe pseudo-Eulidean

setting, the onvergene of both algorithms (on-line and bath) is even not

guaranteed(saddlepointsan bepresent insteadof loaloptima,as pointed

out in [21℄) but, in pratialappliations,divergene was neverobserved.

3. Integrating multiple dissimilarities

Insome spei appliations,the user isinterested insimultaneously an-

alyzing several soures of information: a graph together with additional in-

formation known on its nodes, numerial variables measured on individuals

together with fators desribing these individuals... This situation is often

referred to as multiple view data and suh data are quite ommon in a

numberofelds: gene lusteringfromexpression prolesandontologyinfor-

mation[36℄inbiology,nodelusteringinasoialnetworktakingintoaount

attributes that desribe the nodes [37, 1℄ in soial sienes, and moleules

lustering fromngerprintsand spatialstrutures[38℄inhemistry. Inother

spei appliations,the data set an bedesribed byseveral dissimilarities,

eah enoding spei features of the data but none of them being aknowl-

edged as more informative than the others: in soial sienes for instane,

the hoie of a good dissimilarity to desribe the resemblane between two

event time series isstill anopen issue [4, 5℄.

Theombinationofallsouresofinformationorofseveraldissimilaritiesis

ahallenging problemthataimsatinreasingthe relevaneof thelustering.

In lustering, this issue has already been takled by dierent approahes:

some rely on lustering ensembles, ombining together the lusterings ob-

tained from eah view orfrom eahdissimilarityinto a onsensus lustering

[39℄. A more omplexstrategy, desribed in[40℄,iterativelyupdates the dif-

ferent lusterings using a global log-likelihoodapproah until they onverge

to a onsensus. Other authors propose to onatenate all data/views prior

to the lustering. If kernels are available, this method is known as multi-

(11)

ombinationandthe oeientsoftheonvexombinationare optimizedto-

getherwiththe lustering[41,42℄. Inasimilarway, ifthe dataaredesribed

by numerial variables belonging to dierent feature groups, [43℄ proposes

to weight eah group and tooptimizesimultaneously the lustering and the

weights of the groups.

For the SOM algorithm as well, a few artiles takle related issues: in

partiular, [44℄ ombines numeri and binary variables to produe a single

map by optimizing two quantization energies in parallel and [1, 2℄ use a

multiplekernel framework to integratevarious information.

In the present setion, we use asimilar approah by ombining dierent

dissimilarities in a onvex ombination. We propose an algorithm whih

learns an optimalombinationon-line,by minimizingthe energy funtion.

3.1. Computing a multiple dissimilarity

Suppose now that the observations

x 1

^, ^.^.^.^,

x n

^are ^not ^desribed ^by ^a

single dissimilarity matrix

∆

^,^but ^by

D

dissimilarity matries

∆ ¹

^, ^.^.^.^,

∆ ^D

^,

where

∆ ^d = δ ^d (x i , x j )

ij

^. ^The dissimilaritiesan beeither dierentdissimi- laritiesomputedonthesamedataordissimilaritiesomputedfromdierent

variables measured on the same individuals (e.g., a dissimilarity that mea-

sures proximitiesbetweennodes inagraphandadissimilaritythatmeasures

proximities between the node labels, see Setion 4.3for anexample).

Similarly to the multiple kernel approah desribed in [45℄ or in [1℄ (for

multiple kernel SOM), we propose to ombine all the dissimilarities into a

single one, dened asa onvex ombination:

δ _ij ^α = X D d=1

α d δ _ij ^d

⁽⁴⁾

where

α d ≥ 0

^and

P D

d=1 α d = 1

^. În^theÊulidean^framework,^thisâpproahîs

stritlyequivalenttothemultiplekernelSOM approahbeause

kx i − x j k ² _d = hx i − x j , x i − x j i d

^(multiple ^kernel îs â ônvex ômbination ôf ^dot ^produts

whereasEquation(4)isbasedonaonvexombinationofsquareddistanes).

(12)

Ifthe

(α d )

^are^given,^relational^SOM^based^on^thedissimilarityintrodued inEquation(4)aimsatminimizing(over

(β u ) u

⁾^the^following^energy^funtion

E ((β u ) u , (α d ) d ) = X U u=1

X n i=1

H (d (f (x i ), u)) δ ^α (x i , p u (β u )) ,

where

δ ^α (x i , p u (β u ))

îs ^dened âs ⁱⁿ Êquation⁽²⁾ ^by

δ ^α (x i , p u (β u )) ≡ ∆ ^α i β u − 1

2 β _u ^T ∆ ^α β u

⁽⁵⁾

with

∆ ^α = P

d α d ∆ ^d

When there is no a-priori on the

(α d ) d

^, ^we ^propose ^to ^inlude ^the ^opti-

mizationoftheonvexombinationwithintheon-linealgorithmwhihtrains

themap. Thisideaissimilartothe oneproposedin[46℄foroptimizingaker-

nelparameterinvetorquantizationalgorithms. More preisely,astohasti

gradient desent step is added to the original on-line relational SOM algo-

rithm to optimize the energy

E ((β ui ) ui , (α d ) d )

^, ^over ^both

(β ui ) ji

^and

(α d ) d

^.

Toperformthestohastigradientdesentsteponthe

(α d )

^,^the^omputation

of the derivativeof

E| x i = X U

u=1

H (d (f (x i ), u)) δ ^α (x i , p u (β u ))

(the ontribution of the randomly hosen observation

(x i ) i

^to ^the ^energy)

with respet to

α

^is ^needed. ^Sine

∂

∂α d

[δ ^α (x i , p u )] = δ ^d (x i , p u ),

we have

D id = ∂E| x i

∂α d

= X U u=1

H (d (f (x i ), u))

∆ ^d i β u − 1

2 β _u ^T ∆ ^d β u

.

Followinganidea similarto thatof [45℄,the SOM istrained by perform-

ing, alternatively,the standard stepsof the SOMalgorithm(i.e., assignment

and representation steps) and a gradient desent step for the

(α i ) i

^. ^The

methodology isdesribed inAlgorithm 2.

(13)

1: Forall

u = 1, . . . , U

^and

i = 1, . . . , n,

^initialize

β _ui ⁰

^suh ^that

β _ui ⁰ ≥ 0

^and

P n

i=1 β _ui ⁰ = 1

^.

2: For all

d = 1, . . . , D

^, ^initialize

α ⁰ _d ∈ [0, 1]

^st

P

d α ⁰ _d = 1

^. ^return

δ ^α,0 ← P

d α ⁰ _d δ ^d

^.

3: for t=1,...,T do

4: Randomly hoose aninput

x i

5: Assignment step: nd the unit of the losest prototype

f ^t (x i ) ← arg min

u=1,...,U δ ^α,t−1 (x i , p u (β u ))

where

δ ^α,t−1 (x i , p u (β u ))

îs^dened âsⁱⁿ Êquation ^(5).

6: Representation step: update allprototypes aording to the new as-

signment:

∀ u = 1, . . . , U

^,

β _u ^t ← β _u ^t−1 + µ(t)H (d (f (x i ), u)) 1 i − β _u ^t−1

7: Gradient desent step: update the onvex ombination parameters:

∀ d = 1, . . . , D

^,

α ^t _d ← α ^t−1 _d + ν(t)D _d ^t

where

D ^t _d

îs^the ^desent ^diretionând ûpdate

δ ^α,t δ ^α,t ← X

d

α _d ^t δ ^d .

8: end for

To ensure that the gradient step respets the onstraints on

α

⁽

α d ≥ 0

and

P

d α d = 1

^), ^the ^following^strategy ^is ^used: ^similarly^to ^[47, ^48,^45℄, ^the

gradient

_∂E t− 1

| _xi

∂α d

d

isredued and projetedsuhthatthe non-negativityof

α

îs ênsured. ^The ^following^modied^desent ^step îs ^thus ûsed:

D e d =

 



0

^if

α d = 0

^and

D d − D d 0 > 0

−D d + D d 0

^if

α d > 0

^and

d 6= d ₀ P

d6=d ⁰ , α _d >0 (D d − D d 0 )

^otherwise

The desent step

ν(t)

^is ^dereased ^with ^the ^standard ^rate ^of

ν 0 /t

^with ^an

initial

ν 0

^smallênough^toênsure ^the ^positivity ônstraint ôn

(α d ) d

^.

(14)

In this setion, several appliations, on simulated or real-life data sets,

illustrate the performanes of the proposed methods. Setion 4.1 ompares

on-line and bath relationalSOM on aDNAbaroding dataset, Setion 4.2

ompares the use of dissimilarities and kernels for mapping two politial

graphsintoagrid,Setion4.3illustratestheeienyoftheuseofamultiple

dissimilarityapproahonasimulateddatasetand,nally,Setion4.4applies

the multiplerelationalSOM toalargedataset ofategorialtimeseries and

shows that the multiple relational SOM approah an be used to interpret

whih dissimilaritiesprodue the most relevantlusters.

4.1. Comparisonbetweenon-lineandbathrelational SOMonagenetidata

set

This rst experiment aims at providing a omparison between on-line

and bath relationalSOM. It is performed on a data set that ontains 465

inputdataissuedfromtenunbalanedsampledspeiesofAmazonianbutter-

ies. This data set was previously used by [49℄ to demonstrate the synergy

between DNA baroding and morphologial-diversity studies. The notion

of DNA baroding omprises a wide family of moleular and bioinformatis

methods aimed at identifying biologial speimens and assigning them to a

speies. Aording to the vast literature published duringthe past years on

the topi, two separate tasks emerge for DNA baroding: on the one hand,

assign unknown observations to known speies and, onthe other hand, dis-

over undesribed speies, [50℄. The seond task isusually approahed with

the Neighbor Joining algorithm [51℄ whih onstruts a tree similar to a

dendrogram. When the sample size is large, the trees beome rapidly un-

readable. Moreover, they are quite sensitiveto the order inwhih the input

data are presented. Unsupervised learning and visualization methods are

used to a very limited extent by the DNA baroding ommunity, although

theinformationtheybringmaybequiteuseful. Self-organizingmapsprovide

a visualization of the data while bringing out lusters or groups of lusters

that may orrespond toyetunknown speies.

DNA baroding data are omposed of sequenes of nuleotides, i.e. se-

quenesofa,,g,t lettersinhighdimension(hundredsorthousandsof

sites). Hene, sine the data are not Eulidean, dissimilarity-basedmethods

appear tobemoreappropriate. Speidistanes anddissimilaritiessuhas

the Kimura-2P[52℄areusually omputed. Reently,bathmedianSOMwas

(15)

Although median SOM provided enouraging results, two main drawbaks

emerged. First,sinethe algorithmwasruninbath,the organizationofthe

map was generally poor and highlydependingon the initialization. Seond,

sine the algorithm alulates a prototype for eah luster among the data

set, it does not allow for empty lusters. Thus, the existene of speies or

groups of speies was diult to aknowledge. The use of on-line relational

SOM overomes these two issues. Figure2ontains the maps obtained with

median SOM and relational SOM with PCA initialization, both trained in

bath versions 4

and Figure3 illustrates the mappingprodued with the on-

linerelationalSOM. Thethreealgorithmswere runwithidentialxedseeds

for the random generators. The lustering quality of median SOM is poor,

sineseverallustersmixtogetherseveralspeies. Ontheontrary,relational

SOM allows for empty lusters and thus produes a better mapping, from

a lustering point of view: the only mixing lass orresponds to a labeling

error. Moreover, the empty ells help separating the main groupsof speies.

Clustering may thus be useful in addressing misidentiationissues.

Topographi errors were omputed for the three mappings in order to

assess the quality of the projetion. For the online algorithm, the error is

0.0022,forrelationalSOMwithPCAinitializationweobtained 0.3682,while

the error of median SOM is 0.3094. Hene, the stohastiity of the on-line

algorithmallowedforabetterorganizationofthemap, omparedwithbath

algorithms.

In Figure 3b, distanes with respet to the nearest neighbors were om-

puted for eah node. The distane between two nodes/ells is omputed as

themeandissimilaritybetweentheobservationswithineahlass. Apolygon

is drawn within eah ell with verties proportional to the distanes to its

neighbors. Iftwoneighborprototypesare very lose,then theorresponding

verties are very lose to the edges of the two ells. If the distane between

neighbor prototypes is very large, then the orresponding verties are far

apart, lose to the enter of the ells.

4

relationalbathSOMwithrandominitializationwasalsotestedbut,sinetheresults

were worse than the ones obtained with PCA initialization, they are not shown in this

artile.

(16)

Figure 2: Speies diversitydistributionbyluster (radiusproportionalto thesize ofthe

luster): MedianbathSOM(a)and RelationalbathSOMwithPCAinitialization(b).

(a) (b)

Figure 3: On-linerelationalSOMresultsfor Amazonianbutteries: (a)Speiesdiversity

distributionby luster (radiusis proportionalto theluster size. (b) Distanesbetween

prototypes.

4.2. On-line relational SOM and on-line kernel SOM to deipher the stru-

ture of politial networks

Thispresentsetion'spurposeistogiveaomparisonoftheperformanes

obtained with relational SOM when used with various metris. More pre-

isely, we will show that for strutural data suh as graphs, a kernel is not

alwaysthemostrelevantwaytoextratinformationfromthegraphstruture,

ompared to, i.e., the simple similarity based on the length of the shortest

path between two nodes.

Thedata usedinthis setionome fromtwofamous data sets pertaining

to the US politis. The rst data set is a graph where the nodes are 105

Amerian politial books, all published around the presidential eletion of

(17)

two books were o-purhased by a ommon buyer 5

. All nodes are labeled

aording to their politial aliation (onservative, liberal or neutral), and

this informationwillbe used tovalidate the results a posteriori.

The seonddata set is agraph representing the US politisblogosphere,

reorded in 2004, for the same presidential eletion as the previous one, by

Adami and Glane [54℄. This data set ontains 1 222 nodes whih are po-

litialblogsand 16714edges thatrepresenta hyperlinkbetween twoblogs 6

.

Again,additionalinformationpertainingthepolitialprefereneoftheblogis

alsoprovided(hereonlyonservativeorliberal). Bothgraphsarerepresented

inFigure 4by aFruhterman andReingold [55℄foredireted plaement al-

gorithm and nodes are olored aording to the politial aliation of the

book orof the blog.

Figure 4: Politialbooks (left)and blogs (right)networks. Nodesare labeled aording

to the politial orientation of thebook orof the blog: pinkis for onservative, blue for

liberalandgreenforneutral.

Relational SOM was performed to projet the nodes of the two graphs

ona square grid havingdimension

5 × 5

^(books)^and

10 × 10

^(blogs). ^Three

dierentdissimilarities were used toperform this task:

5

This graph was built by Valdis Krebs and is available for downloading at

http://www-personal.umih.edu/~mejn/netdata/polbooks.zip.

6

The original graph was direted but weonly used undireted edges to perform our

analysis.

(18)

•

^the ^length ^of ^the ^shortest ^path ^between ^two ^nodes. ^Note^that, ⁱⁿ ^gen-

eral, the length of the shortest path is not a Eulidean distane: for

the two graphs desribed in this setion, the ondition of [16℄ is not

satised;

•

^a dissimilarity dened as the square of the distane indued by the heatkernel(

K = e ^−γL

^where

L

^is^the^Laplaian,^[56℄),^with^parameters

γ = 0.1

ând^1. În^thisâse,^relational^SOMîsêquivalent^to^kernel^SOM

as desribed in [18, 20℄;

•

^a dissimilarity dened as the square of the distane indued by the ommute time kernel [57℄.

The performanes of the tested methods were assessed using three riteria:

the modularity of the obtained partition, the neurons' purity (ompared to

the politial labels) and the topographierror of the map. The modularity

[58℄is a measure of quality of a partitionof the nodes in a graph:

Q = X U u=1

X

i,j:f (x i )=f (x j )=u

E ij − d i d j

2m

where

E ij = 1

ⁱ ^there îs ân êdge ^between ^nodes

x i

^and

x j

^,

d i

^is ^the ^degree

of node

x i

^and

m

îs ^the ^number ôf êdges ⁱⁿ ^the ^graph. ^The ^best ^partition

orresponds to the largest modularity. The neurons' purity is a measure

of the onsisteny of the lustering with respet to the politial labels: it

ounts the frequeny of the politial labels of the nodes that are equal to

the majority politial label of the node's luster. The loser to 1 the purity

is, the better the lustering is. The last quality riterium, the topographi

error of the map [35℄, quanties the ontinuity of the map, with respet to

the input-spaemetriasalready explainedinSetion2.3. Notie that,asit

omputes the seond best mathing unit, the topographi error depends on

the metriof theinputspaeitselfandtellsusifthismetriiswellpreserved

on the map.

The results are given in Table 1. In addition, Figures 5 (books) and 6

(blogs) display two of the maps obtained for eah data set. First note that

the modularity obtained with the SOM algorithm should not be ompared

with that of a standard node lustering algorithm: the number of lusters

usedinsuhmapsisoftenmuhlargerthantheoptimalnumberoflustersfor

the modularity(forinstane, theoptimalmodularityfound bythe algorithm

(19)

length

γ = 0.1 γ = 1

^kernel

Politial books

modularity 0.25 -0.05 0.08 0.27

purity 0.88 0.61 0.72 0.89

topo. error 0.048 0.133 0.038 0.038

Politialblogs

modularity 0.08 0.02 0.00 0.00

purity 0.93 0.89 0.79 0.57

topo error 0.303 0.047 0.322 0.899

Table 1: Modularity, neurons' purity and topographi error obtained for the data sets

politialbooks andpolitialblogsbyrelationalandkernelSOMalgorithms.

desribed in [59℄ givesonly 10 lusters, that should be ompared tothe 100

lusters of the map). Nevertheless, this measure of the lustering quality is

still validfor omparing dierent dissimilarities.

For the politial books data set, the best map is obtained by using the

on-line kernel SOM algorithm with the ommute time kernel. The on-line

relationalSOM withthe shortest path dissimilarityobtainsomparableper-

formane but the heat kernel gives poor results, whatever the value of

γ

^.

For the politial blogs, the on-line relational SOM gives good results and

manages to disriminate the two groups of blogs quite well, while its topo-

graphi error is rather large. On the other hand, the kernel SOM with the

heat kernel (

γ = 0.1

⁾ ^has â ^muh ^better ^topographi êrror, ^while ît ^badly

disriminates the labels and produes a bad lustering of the nodes of the

graph on the map (as measured by the modularity): this an be explained

by the fat that, even though the map properly represents the topographi

organization of the input spae, the metri used to represent the data may

not be the most aurate to emphasize some partiular features of the data

that an beof a major interest for the user.

In a seond step, a hierarhial lustering of the prototypes was per-

formed. Using the symboli representation of the prototypes as

p u ∼ P n

i=1 β ui x i

^, ^the dissimilarity between two prototypes an be expressed as:

δ(p u , p u ^′ ) := − 1

2 (β u − β u ^′ ) ^T ∆ (β u − β u ^′ )

⁽⁶⁾

and used asan input in the hierarhial lustering algorithm (see [20℄, The-

(20)

the shortest path dissimilarity(left) and by the on-line kernel SOMwith the ommute

timekernel(right).

orem 1,for ajustiationofthis formula). Onlythree andtwolusterswere

kept for, respetively, the politial blogs and the politial books data sets

in order to try toretrieve the original labels. The resultinglusters are dis-

played in Figures7 and 8. Moreover, the lasses' purity and modularity are

given in Table 2.

Dissimilarity Shortest path Heat kernel Heat kernel Commute time

length

γ = 0.1 γ = 1

^kernel

Politial books

modularity 0.50 -0.02 -0.00 0.41

lasses' purity 0.84 0.49 0.47 0.76

Politialblogs

modularity 0.39 0.04 -0.00 0.00

lasses' purity 0.91 0.52 0.58 0.52

Table2: Modularityandpurityofthelassesobtainedbyahierarhiallusteringofthe

prototypes,forthedatasets politialbooks and politialblogs.

Asexpeted,hierarhiallusteringtendstoslightlyderease thelasses'

purity (ompared to the neurons' purity) and to strongly inrease the mod-

ularity. But italsoaets whih ofthe dissimilarities seemstorepresent the

data better: for both data sets, the shortest path dissimilarity overomes

the dissimilarities based on kernels. This shows that the use of a kernel is

(21)

theshortestpathdissimilarity(left)andbytheon-linekernelSOMwith theheatkernel

γ = 0.1

^(right).

not alwaysthe best possible hoie for omputing similarities/dissimilarities

between observations and thatallowingthe use ofa largerfamily ofdissimi-

larities an be useful in some ases.

4.3. Multiple relational SOM on simulated data

In this setion, a simple example is used to test the algorithm and il-

lustrate its behavior in the presene of omplementary information. 200

observations, divided into 8 groups (indexed from 1 to 8 in the following),

were generated using threedierent typesof data:

•

ân ûnweighted ^graph, ^simulated ^similarly âs ^the ^planted

3

^-partition

graph desribed in[60℄. Thenodes ofthe groups1to4and the nodes

of the groups 5to8 ould not be distinguishedinthe graph struture:

the edgeswithinthesetwo setsofnodeswere randomlygeneratedwith

a probability equal to

0.3

^. ^The ^edges ^between ^these ^two ^sets ^of ^nodes

were randomlygenerated with a probabilityequal to

0.01

^;

•

^numerial^data ^that ^were ^two dimensionalGaussianvetors. Thevari- ables orresponding to observations of odd groups were simulated by

Gaussianvetorswithmean

(0, 0)

^and independentomponentshaving avarianeequalto

0.3

^and^the ^variablesorrespondingtoobservations of even groups were simulated by Gaussian vetors with mean

(1, 1)

and independent omponentshaving a varianeequal to

0.3

^;

(22)

the shortest path dissimilarity(left) and by the on-line kernel SOMwith the ommute

timekernel(right).

•

^a ^fator ^with ² ^levels. Observations of groups 1, 2, 5, and 7 were aeted to the rst level and observations of the other groups to the

seond level.

Hene, only the ombined knowledge of the three data sets gave aess

to the eight original groups. The multiple relational SOM algorithm was

applied to this problem with the shortest path distane for the graph, the

standard Eulidean distane for the numerial data and Die's distane for

the fator variable (equal to 0 if the fators are idential between the two

observationsand to1 if not). The algorithmwasompared with

•

â ^multiple^kernel ^SOM âpproah âs ^desribed ⁱⁿ ^[2℄ ^where ^the ^kernels

usedwere theommutetimekernel[57℄forthe graphandthe Gaussian

kernelforboththeotherdatasets(thefatorwasreodedasanumeri

variable using its disjuntive form). The parameter of the Gaussian

kernel was set as reommended in[61℄;

•

â ^standard ^relational ^SOM âpproah ûsing ône ôf ^the ^three ^data ^sets

only. It was also ompared to the dissimilarity SOM using numerial

andfatordataorallthethreedatasetsbutusedasifthey wereissued

fromthesame dataset withaEulideandistane (whenthegraphwas

added to the numerial and fator data, it was under the form of its

adjaeny matrix).

(23)

theshortestpathdissimilarity(left)andbytheon-linekernelSOMwith theheatkernel

γ = 0.1

^(right).

The omparison was performed on 100 dierent data sets generated as pre-

viously desribed.

The performanes of the dierent approahes were ompared using the

normalized mutual information[62℄ with respet to the originallasses, the

average node purity,taking again asa referenethe originallasses, and the

topographi error [35℄. The rst two quality measures quantify the adequa-

tion between the original lasses and the lustering provided by the SOM.

The node purity has values between 0and 1 and is equalto1 when the two

partitionsareidential. Thelastqualitymeasure,thetopographierror,does

not depend onthe original lass but itquanties the ontinuity of the map,

with respet to the input spae metri. The results are given in Figure 9,

whih displays the distributions of the normalized mutual information, the

nodes' purity and the topographi error, over the 100 data sets. Figure 10

provides examplesof resulting maps.

(24)

pographierror(bottom)ofmultiplerelationalSOM,multiplekernelSOMandrelational

SOMusedwithallortwodatasets(num&fa)simplymergedinasingledatasetorwith

asingledataset (graph,numeriorfator).

Taking into aount the lustering quality (normalized mutual informa-

tion), the node purity and the topographi quality, the multiple relational

SOM outperforms the othermethods. The dierenebetween the use of the

shortest-pathdissimilarityandakernelforgraphinasimilarmultiplesetting

is small but still signiant (with

p

^-values ^smaller ^than

10 ⁻⁹

^for ^Wiloxon

paired tests).

(25)

sion of the results beauseit penalizes the fatthat the originallusters are

separated into several neurons on the grid. This explains the good perfor-

mane(despitetheirlargevariability),intermofnormalizedmutualinforma-

tion, ofthegrid builtfromthe numerivariablesandthe fatoronlybeause

this latter map ontains muh more empty lusters as shown in Figure 10.

Onthe ontrary,the exampleofthemapresultingfromon-linemultiplerela-

tionalSOM inFigure10shows agoodlassiationandagoodorganization

aording to the three types of information: the eight groups are almost

perfetlydistinguishedby the algorithm.

Also note that the topographi error is not an optimal way to ompare

the results obtained with data sets that do not ontain the same amount of

information: indeed, the very good topographi error obtained by the map

trained from the numeri data only or the fator only simply means that

the topographi properties of these data is well preserved on the map but

this annot be ompared to the multiple relation SOM, the multiple kernel

SOM orthe map trainedwith alldata anda standard SOM:these maps are

supposed to preserve the topographi properties of all three sorts of data,

whihis aharder taskthan preserving the topographiproperty ofonly one

sort(numeri,fator,graph)ofdata. Inthisase,mergingalldatainasingle

data set whih isthen passed asaninput toa numeriSOM leads toavery

bad topographierror(approximately30times larger thanthe one obtained

with multiplerelationalSOM or multiple kernel SOM).

Theevolutionof the

α

^,^shown ⁱⁿ^Figure ¹⁰^is^alsointeresting: the Die's distane,whihistheonlysimilaritymeasurebasedonanonnoisysetofdata

obtains larger weights than the other two dissimilarities. This is onsistent

with the fat that these data are indeed the best of the three data sets to

distinguish between the original lusters: as shown in Figure 9, the map

basedonthefator isbetterinterms ofnormalizedmutualinformationthan

theones basedonthenumerivariablesoronthegraphonly(itsnodepurity

isverylowbeauseitperfetlydistinguishedthedataintotwolusterswhere

four originallusters are equally mixed).

(26)

α

^evolution ^multiple^rSOM

rSOM (simple)rSOM

with numeri variablesand fator with all three datasets

Figure 10: Summary of the experiment: the original graph and the original distri-

bution ofthenumerialvariablesisgivenatthetopofthegure;multiple rSOMresults

(seond row)withtheevolutionof the

α

ând^the^resulting^map ^(disks^haveânârea^pro-

portionalto thenumberof observations andare oloredaordingto thedistribution of

the originallassesin theorrespondingneuron);bottom: maps obtainedusingnumeri

variablesandfatormergedinasingledatasetandasimpleEulideandistane(left)and

usingallthreedatasets butasimpleEulideandistane.

(27)

The last example illustrates multiple relational SOM on data related to

shool-to-worktransitions. Weused thedatainthesurvey Generation98 7

.

Aording to the Frenh National Institute of Statistis (INSEE), 22.7% of

young people under 25 were unemployed at the end of the rst semester

2012.

8

Hene, it is ruial to understand how the transition from shool to

employmentor unemployment is ahieved, in the urrent eonomi ontext.

The dataset ontains informationon16040 youngpeoplehavinggraduated

in1998and monitoredduring94monthsafterhavingleftshool. The labor-

market statuses have nine ategories, labeled as follows: permanent-labor

ontrat, xed-term ontrat, apprentieship ontrat, publi temporary-

labor ontrat, on-all ontrat, unemployed, inative, militaryservie, ed-

uation. The following stylized fats are highlighted by a rst desriptive

analysis of the data asshown in Figure11:

•

permanent-laborontratsrepresentmorethan20%ofallstatusesafter oneyearandtheirratioontinuestoinreaseuntil50%afterthreeyears

and almost 75% after seven years;

•

^the ^ratio ôf ^xed-terms ôntrats îs ^more ^than ^20% âfter ône ^year ôn

thelabormarket,but itisdereasingto15%afterthreeyears andthen

seems to onverge to 8%;

•

âlmost^30%ôf^the^young^graduatesâreûnemployedâfterône^year. ^This

ratio is dereasing and beomes onstant,10%, after the fourth year.

The dissimilarities between sequenes were omputed using optimal math-

ing (OM). Also known as edit distane or Levenshtein distane, optimal

mathing was rst introdued in biology by [31℄ and used for aligning and

omparingsequenes. Insoialsienes,therstappliationsaredueto[32℄.

The underlying idea of optimal mathing is to transform one sequene into

anotherusingthreepossibleoperations: insertion,deletionandsubstitution.

Aostisassoiatedtoeahofthethreeoperations. Thedissimilaritybetween

7

available thanks to Génération1998 á 7 ans - 2005, [produer℄ CEREQ, [diusion℄

CentreMaurieHalbwahs(CMH)

8

All omputations were performed with the free statistial software environment R

(http://ran.r-projet.org/,[25℄). Thegraphialillustrationswerearriedoutusing

theTraMineRpakage[63℄.

(28)

sequenes is then omputed as the ost assoiated to the smallest number

of operations whih allows to transformthe sequenes into eah other. The

methodseemssimple andrelativelyintuitive,butthe hoie of theosts isa

deliateoperationinsoialsienes. Thistopiissubjettolivelydebatesin

the literature [4, 5℄ mostly beause of the diulties toestablish anexpliit

and sound theoretial frame.

Inourappliation,allareerpathshavethesamelength,thestatusofthe

graduatestudentsbeingobserved during94months. Hene,wesuppose that

there arenoinsertions ordeletionsandthat onlythe substitutionostshave

to be dened for OM metris. Among optimal-mathing dissimilarities, we

seleted threedissimilarities: the OMwithsubstitutionostsomputed from

the transition matrix between statuses as proposed in [64℄, the Hamming

dissimilarity (HAM, no insertion or deletion osts and a substitution ost

equal to 1) and the Dynami Hamming dissimilarity (DHD as desribed in

[65℄).

In order to identify the role of the dierent dissimilarities in extrating

typologies, we onsidered several samples drawn at random from the data.

For eah of the experiments below, 50 samples ontaining 1 000 input se-

queneseahwereonsidered. Inordertoassessthequality ofthe maps,two

indexes were omputed: the quantizationerrorfor quantifying thequalityof

the lustering and the topographi error for quantifying the quality of the

mapping, [66℄. These quality riteria all depend on the dissimilarities used

to train the map but the results are made omparable by using normalized

dissimilarities.

(29)

α

^-Mean ^0.43111 ^0.28459 ^0.28429

α

^-Std ^0.02912 ^0.01464 ^0.01523

Metri OM HAM DHD Optimally-tuned

α

Quantizationerror 92.93672 121.67305 121.05520 114.84431

Topographierror 0.07390 0.08806 0.08124 0.05268

Table3: Preliminaryresultsforthree OMmetris(averageover50randomsubsamples):

Optimally-tuned

α

^(top^table)^and^Quality^riteria^for^the^SOM^lustering^(bottom^table).

The results are listed in Table 3. Aording to the mean values of the

α

^'s,^the ^threedissimilaritiesontributed toextratingtypologies. The Ham- ming andthedynamialHammingdissimilaritieshavesimilarweights,while

the OM with ost-matrixdened fromthe transitionmatrix has the largest

weight. The mean quantization error omputed on the maps trained with

the three dissimilarities optimally ombined is larger than the quantization

error omputed on the map trained with the OM metrionly. On the other

hand, the topographierror is improved inthe mixed ase. In this ase, the

jointuse of the three dissimilaritiesprovides atrade-o between the quality

of the lustering and the quality of the mapping. The results onrm the

diulty todene adequateosts inoptimalmathing and the fatthat the

metrihastobehosenaordingtotheaimofthestudy: buildingtypologies

(lustering) or visualizingdata (mapping).

Finally, multiple rSOM was trained on the entire data set. The nal

map is illustrated in Figure 12. Several typologies emerge from the map: a

fast aess to permanent ontrats (lear blue), a transition through xed-

term ontrats before obtaining stable ones (dark and then lear blue), a

holding on prearious jobs (dark blue), a publi temporary ontrat (dark

green)oranon-all(pink)ontratendingattheend byastable one,along

period of inativity (yellow) or unemployment (red) with a gradual return

to employment. The mapping also shows a progressive transition between

trajetories of exlusion on the west and quik integration on the east. A

more detailedstudy of this data set is availablein [67℄.

(30)

5. Conlusion

An on-line version of relational SOM is introdued in this paper. It

ombines the standard advantage of the stohasti version of SOM (better

organization) with relational SOM, whih is able to handle data desribed

by dissimilarities. This approah is extended to the ase where several dis-

similarities are available for the initial data set. Online multiple relational

SOMhandlesseveraldissimilaritiesbyombiningtheminanoptimalfashion.

The algorithmshows good performanes, ompared toalternativemethods,

in projeting data desribed by numerialvariables,by ategorial variables

or by relations and is helpful to understand whih dissimilarity is the most

relevantwhen several ones are available. However, initsmultipledissimilar-

ity version, the main drawbak of the proposed relationalSOM algorithmis

related to the omputationtime: a sparse version should be investigated to

allow usto handlevery large data sets.

6. Aknowledgements

We thank the anonymous referees fortheir thorough omments and sug-

gestions and for valuable hints oninteresting referenes.

(31)

[1℄ N. Villa-Vialaneix, M. Olteanu, C. Ciero-Ayrolles, Carte auto-

organisatriepourgraphesétiquetés,in: AtesdesAteliersFGG(Fouille

de GrandsGraphes), olloqueEGC (ExtrationetGestion de Connais-

sanes), Toulouse,Frane,2013.

[2℄ M.Olteanu,N.Villa-Vialaneix,C. Ciero-Ayrolles, Multiplekernelself-

organizing maps, in: M. Verleysen (Ed.), XXIst European Symposium

onArtiial NeuralNetworks,ComputationalIntelligene and Mahine

Learning(ESANN),d-sidepubliations,Bruges, Belgium,2013,pp. 83

88.

[3℄ E. Pkalska, R. Duin, The Dissimilarity Representation for Pattern

Reognition. Foundations and Appliations, World Sienti, 2005.

[4℄ A. Abbott, A. Tsay, Sequene analysis and optimalmathing methods

in soiology, Review and Prospet. SoiologialMethods and Researh

29 (2000)333.

[5℄ L. Wu, Some omments on Sequene analysis and optimal mathing

methods in soiology, review and prospet, Soiologial Methods and

Researh 29 (2000)4164.

[6℄ M. Cottrell, M. Olteanu, F. Rossi, J. Rynkiewiz, N. Villa-Vialaneix,

Neural networks foromplexdata, KünstliheIntelligenz26(2012)18.

[7℄ M. Cottrell, P. Letrémy, How to use the Kohonen algorithmto simul-

taneously analyse individuals in a survey, Neuroomputing 63 (2005)

193207.

[8℄ T. Kohohen, P.Somervuo, Self-organizingmapsofsymbolstrings, Neu-

roomputing21 (1998)1930.

[9℄ B. Conan-Guez, F. Rossi, A. El Golli, Fast algorithm and implemen-

tation of dissimilarity self-organizing maps, Neural Networks 19(2006)

855863.

[10℄ D. MaDonald, C. Fyfe, The kernel selforganising map., in: Proeed-

ings of 4th International Conferene on knowledge-based intelligene

engineering systems and applied tehnologies, 2000, pp. 317320.

(32)

Systems 12 (2002)117135.

[12℄ N. Villa, F. Rossi, A omparison between dissimilarity SOM and

kernel SOM for lustering the verties of a graph, in: 6th In-

ternational Workshop on Self-Organizing Maps (WSOM), Neuroin-

formatis Group, Bieleeld University, Bieleeld, Germany, 2007.

doi:10.2390/bieoll-wsom2007-13 9.

[13℄ R. Boulet, B. Jouve, F. Rossi, N. Villa, Bath kernel SOM and re-

latedlaplaianmethodsforsoialnetworkanalysis, Neuroomputing71

(2008) 12571273.

[14℄ T. Gärtner,KernelforStruturedData,volume72ofSeriesin Mahine

Pereption and Artiial Intelligene,World Sienti, 2008.

[15℄ I. Shoenberg, Remarks to Maurie Fréhet's artile Sur la dénition

axiomatique d'une lasse d'espae distaniés vetoriellement appliable

sur l'espaede Hilbert, Annals of Mathematis 36(1935) 724732.

[16℄ G.Young,A.Householder, Disussionofasetofpointsintermsoftheir

mutualdistanes, Psyhometrika3 (1938) 1922.

[17℄ N. Krislok,H. Wolkowiz,Handbook onSemidenite,Coniand Poly-

nomial Optimization, volume 166 of International Series in Operations

Researh & Management Siene,Springer, 2012, pp. 879914.

[18℄ B. Hammer,A. Hasenfuss,F. Rossi, M.Strikert, Topographiproess-

ing of relational data, in: B. U. Neuroinformatis Group (Ed.), Pro-

eedings of the 6th Workshop on Self-Organizing Maps (WSOM 07),

Bielefeld, Germany,2007.

[19℄ F. Rossi, A. Hasenfuss, B. Hammer, Aelerating relational lustering

algorithmswithsparse prototype representation, in: Proeedings of the

6th Workshop on Self-OrganizingMaps (WSOM 07), Neuroinformatis

Group, Bieleeld University, Bieleeld,Germany, 2007.

[20℄ B. Hammer, A. Hasenfuss, Topographi mappingof large dissimilarity

data sets, Neural Computation22(2010) 22292284.

(33)

X. Zhu, Topographi mapping of dissimilarity data, in: J. Laaksonen,

T. Honkela (Eds.), Advanes in Self-Organizing Maps (Proeedings of

the 8thWorkshoponSelf-OrganizingMaps,WSOM 2011),volume6731

of Leture Notes in Computer Siene, Springer, Espoo, Finland, 2011,

pp. 115.

[22℄ L. Goldfarb, A uniedapproahtopatternreognition, PatternReog-

nition 17(1984) 575582.

[23℄ J. Fort, P. Letremy, M. Cottrell, Advantages and drawbaks of the

bath kohonen algorithm, in: M. Verleysen (Ed.), Proeedings of 10th

European Symposium on Artiial Neural Networks (ESANN 2002),

Bruges, Belgium,2002, pp. 223230.

[24℄ M. Olteanu, N. Villa-Vialaneix, M. Cottrell, On-line relational som

for dissimilaritydata, in: P.Estevez, J.Prinipe,P.Zegers, G.Barreto

(Eds.),AdvanesinSelf-OrganizingMaps(ProeedingsofWSOM2012),

volume 198 of AISC (Advanes in IntelligentSystems and Computing),

Springer Verlag, Berlin, Heidelberg, Santiago, Chile, 2012, pp. 1322.

doi:10.1007/978-3-642-35230-0_ 2.

[25℄ R Development Core Team, R: A Language and Environ-

ment for Statistial Computing, Vienna, Austria, 2012. URL:

http://www.R-projet.org,ISBN 3-900051-07-0.

[26℄ S.Theuÿl,A.Zeileis, CollaborativesoftwaredevelopmentusingR-Forge,

The R Journal 1 (2009)914. URL: http://journal.R-projet.org/.

[27℄ RStudio and In., shiny: Web Appliation Framework for R, 2013.

URL: http://CRAN.R-projet.or g/pa kag e=sh iny, R pakage ver-

sion 0.6.0.

[28℄ T. Heskes, Energy funtions for self-organizing maps, in: E. Oja,

S. Kaski (Eds.), Kohonen Maps, Elsevier, Amsterdam, 1999, pp. 303

315. URL: http://www.snn.ru.nl/reports /Hes kes. wsom .ps. gz.

[29℄ A. El Golli,F. Rossi, B. Conan-Guez, Y. Lehevallier, Une adaptation

des artesauto-organisatries pour des données déritespar un tableau

de dissimilarités, Revue de StatistiqueAppliquée LIV (2006)3364.

p u, valued in the same spae as the input data, is

n

x 1

x n

U

p u

u ∈ {1, . . . , U }

d

(u, u ′ )

d (u, u ′ )

u

u ′

•

f (x i ) = arg min

u=1,...,U kx i − p u k,

•

p

u = p

u + µH (d (f (x i ), u))

x i − p

u

,

H

H : R + → R +

H(0) = 1

lim x→+∞ H(x) = 0

µ

H

µ

E ((p u ) u ) = X U u=1

Z

δ u,f(x)

X U l=1

H(d(l, u))kx − p l k 2 P (dx) ,

δ u,f(x) =

1,

f (x) = u 0,

G

G

∆ = (δ ij ) i,j=1,...,n

∆

δ ij ≥ 0

δ ij = δ ji

δ ii = 0

(x i ) i

p u ∼ P

i β ui x i

P

i β ui = 1

β ui ≥ 0

β u

(β u1 , . . . , β un )

∆

β u

δ(x i , p u ) ≡ ∆ i β u − 1

2 β u T ∆ β u .

∆ i

i

∆

β ui = H(d(f (x i ), u)) P

i ′ H(d(f (x i ′ ), u)) .

K : G × G → R

δ(x i , x j ) := K(x i , x i ) + K(x j , x j ) − 2K(x i , x j ).

∆

s ij = (δ(x i , x n ) 2 + δ(x j , x n ) 2 − δ(x i , x j ) 2 ) /2

s(i, j ) = − 1

2 δ 2 (x i , x j ) − 1 n

X n k=1

δ 2 (x i , x k ) − 1 n

X n k=1

δ 2 (x k , x j ) + 1 n 2

X n k,k ′ =1

δ 2 (x k , x k ′ )

!

(s ij ) ij

G

+

+

p u

u = 1, . . . , U

(u, u ^′ )

d (u, u ^′ )

u ^′

_u = p

_u + µH (d (f (x i ), u))

_u

H : R ⁺ → R ⁺

lim _x→+∞ H(x) = 0

H(d(l, u))kx − p l k ² P (dx) ,

δ _u,f(x) =

2 β _u ^T ∆ β u .

i ^′ H(d(f (x i ^′ ), u)) .

s ij = (δ(x i , x n ) ² + δ(x j , x n ) ² − δ(x i , x j ) ² ) /2

2 δ ² (x i , x j ) − 1 n

δ ² (x i , x k ) − 1 n

δ ² (x k , x j ) + 1 n ²

X n k,k ^′ =1

δ ² (x k , x k ^′ )

β _ui ⁰

β _ui ⁰ ≥ 0

i β _ui ⁰ = 1

f ^t (x i ) ← arg min

β _u ^t−1 ∆

2 (β _u ^t−1 ) ^T ∆ β _u ^t−1

β _u ^t ← β _u ^t−1 + µ(t)H ^t (d(f ^t (x i ), u)) 1 i − β _u ^t−1

1 ⁱ

i β _ui ^t = 1

• β _ui ^t ≥ 0

µ(t)H ^t

H ^t

p ⁰ _u = P

i β _ui ⁰ x i

[0, 1] ²

R ²

[0, 1] ²

O(Un ² )

∆ ¹

∆ ^D

∆ ^d = δ ^d (x i , x j )

δ _ij ^α = X D d=1

α d δ _ij ^d

kx i − x j k ² _d = hx i − x j , x i − x j i d

H (d (f (x i ), u)) δ ^α (x i , p u (β u )) ,

δ ^α (x i , p u (β u ))

δ ^α (x i , p u (β u )) ≡ ∆ ^α i β u − 1

2 β _u ^T ∆ ^α β u

∆ ^α = P

d α d ∆ ^d

H (d (f (x i ), u)) δ ^α (x i , p u (β u ))