Madalina Olteanu a
,Nathalie Villa-Vialaneix a,b
a
SAMM,EA4543, UniversitéParis1,F-75634 Paris, Frane
b
INRA,UR875, MIAT, F-31326Castanet-Tolosan,Frane
Abstrat
Insomeappliationsandinordertoaddressreal-worldsituationsbetter,data
maybemoreomplexthansimplenumerialvetors. Insomeexamples,data
an beknownonlythrough their pairwisedissimilaritiesorthrough multiple
dissimilarities, eah of them desribing a partiular feature of the data set.
Several variants of the Self Organizing Map (SOM) algorithm were intro-
dued to generalize the original algorithm to the framework of dissimilarity
data. WhereasmedianSOM isbasedonaroughrepresentation oftheproto-
types,relationalSOMallowsrepresentingtheseprototypesbyavirtuallinear
ombination of all elements in the data set, referring to a pseudo-eulidean
framework. In the presentartile,anon-lineversionof relationalSOMis in-
troduedandstudied. SimilarlytothesituationintheEulideanframework,
this on-line algorithm provides a better organization and is muh less sen-
sible toprototype initializationthan standard (bath) relationalSOM. In a
moregeneralase,thisstohastiversionallowsustointegrateanadditional
stohastigradient desentstep in the algorithmwhihan tune the respe-
tiveweightsofseveraldissimilaritiesinanoptimalway: theresultingmultiple
relationalSOM thushastheabilitytointegrateseveralsouresofdataofdif-
ferenttypes, ortomakeaonsensusbetweenseveraldissimilaritiesdesribing
the same data. The algorithmsintroduedin this manusript are tested on
several data sets, inluding ategorial data and graphs. On-line relational
SOM is urrently available in the R pakage SOMbrero that an be down-
loadedathttp://sombrero.r-forge. r-p roje t.o rg/ordiretlytestedon
its Web UserInterfae athttp://shiny.nathalievilla .org /som brer o.
Keywords: Self-OrganizingMap, Dissimilarity,Kernel, On-line,Multiple
Email addresses: madalina.olteanuuniv-paris1.fr(MadalinaOlteanu),
nathalie.villatoulouse.inra.fr(NathalieVilla-Vialaneix)
1. Introdution
In many real-world appliations, data an not always be desribed by a
xed set of numerial attributes. This is the ase, for instane, when data
are desribed by ategorial variables or by relations between objets (i.e.,
personsinvolvedinasoialnetwork). Thisissueanbeeventrikierwhenthe
dataareomposedofseveralsouresofnonhomogeneousinformation(e.g.,a
soialnetworktogetherwith attributesonthe nodes asin[1,2℄). Aommon
solutiontoaddressthis kindofissue istouseameasure ofresemblane(i.e.,
a similarity or a dissimilarity) that an handle ategorial variables, graphs
or fous on spei aspets of the data, designed by expertise knowledge
[3℄. Many standard methods for data mining have been generalized to non
vetorial data, reently inluding prototype-based lustering, even though,
in some ases, the hoie of the most relevant dissimilarityremains anopen
issue (see [4, 5℄ for a disussion on this topi in the eld of soial siene).
The reentpaper[6℄providesanoverview ofseveral methodsthat have been
proposed totakle omplex data with neural networks.
Inpartiular, several extensions of theSelf-OrganizingMap(SOM) algo-
rithmhavebeenproposed. OneapproahonsistsinextendingSOM toate-
gorialdatabyusingamethodsimilartoMultipleCorrespondene Analysis,
[7℄. Another approah uses the median priniple whih onsists in repla-
ingthe standard omputationof the prototypes by anapproximationin the
original data set. This priniple was used to extend SOM to dissimilarity
data in [8℄. One of the main drawbaks of this approah is that foring the
prototypes to be hosen among the data set is very restritive; in order to
inrease the exibilityof the representation, [9℄ proposestorepresent alass
by several prototypes, allhosen among the original data set. However this
methodinreasestheomputationaltime,whileprototypesremainrestrited
totheoriginaldatasetandmaygeneratepossiblesamplingorsparsityissues.
An alternative to median-based algorithms relies on a method that is
lose tothe standard algorithm used in the Eulidean ase. This methodis
based onthe idea that prototypes may be expressed as linear ombinations
of the original input data. In kernel SOM framework, this setting is made
natural by the use of the kernel, whih maps the original data into a (large
dimensional) Eulidean spae (see [10, 11, 12℄ for on-line versions and [13℄
data suh as strings, nodes in a graph or graphs themselves [14℄. In some
ases, the data are solely desribed by a dissimilarity matrix. [15, 16, 17℄
give neessary and suient onditions for a symmetri matrix to be a dis-
tane matrix in an Eulidean spae but, as pointed out by [3℄, the lass of
similarity/dissimilaritythatan beembedded inaEulidean spae israther
limited and doesnot aommodate ona numberof useful measures already
developed intheliterature. Inthis ase,[18,19,20,21℄propose tointrodue
an impliitonvex ombination of the original data inorder to extend the
lassial bath versions of SOM to dissimilarity data: this approah impli-
itly uses the embedding of the original data ina pseudo-eulidean spae, as
dened in[22℄.
However, bathversionsoftheSOMalgorithmareknown, atleastforthe
standardnumerialSOM[23℄,topresentseveraldrawbakssuhaspoororga-
nizationandstrongdependenyontheprototypeinitialization. Thisproblem
may be partially ountered using PCA orMDS initializations, but when no
goodinitializationisavailable,astohasti(alsoalledon-line)versionofthe
algorithmanbeverybeneial. Thepurposeofthepresentpaperistointro-
dueandjustifytheon-lineversionofrelationalSOM,asalreadyproposedin
[24℄. Suh anapproahleads toa better organizationof the map. Addition-
ally,takingadvantageofthestohastisheme,relationalSOMisextendedto
integrateseveral souresof non homogeneousinformationbyusing anadap-
tive onvex ombinationof dissimilarities. The weightsof eah dissimilarity
areupdatedduringtheSOMlearningproessbyanadditionalstohastigra-
dient desent step. In the remaining of this manusript, Setion2 desribes
the on-line extension of the relational SOM algorithm, already studied in
[24℄, while Setion 3 desribes how this approah an be used to integrate
multipleinformationomingeitherfromdierent datasets or fromdierent
dissimilarity measures. Finally, Setion 4 illustrates the approah on simu-
latedandreal-worlddata setsandomparesitwithpreviousliterature. Note
that theon-linerelationalSOM isavailableinthe R[25℄pakageSOMbrero,
whih an be downloaded on R-Forge [26℄
1
or tested on its shiny [27℄ Web
User Interfae athttp://shiny.nathalievilla .org /som brer o.
1
http://sombrero.r-forge.r-projet.org/
Let us reall that the Self-Organizing Map (SOM) algorithm aims at
mapping
n
input datax 1, ..., x n into a low dimensional grid omposed
of U
units. A prototype p u, valued in the same spae as the input data, is
U
units. A prototypep u, valued in the same spae as the input data, is
assoiatedtoeahunit
u ∈ {1, . . . , U }
ofthegrid. Thegridinduesanaturaldistane
d
on the map: for every pair of neurons(u, u ′ )
,d (u, u ′ )
is usuallydened as the length of the shortest path between
u
andu ′ (although other
topologiesaresometimes used,inludingthestandardEulidean distane on
thegrid). Thealgorithmaimsatlusteringtogethersimilarobservationsand
alsoatpreserving theoriginaltopologyofthe datasetonthemap (i.e.,lose
observations are lustered into lose units on the map, distant observations
are lustered intodistant units on the map). In order todo so, an iterative
proess is performed by alternating two steps. The original algorithm for
numerial vetors may beresumed asfollows:
•
an assignment step where one observation (on-line version) or all ob- servations (bath version) is/are aeted to the losest prototype (inthe sense of the Eulidean distane):
f (x i ) = arg min
u=1,...,U kx i − p u k,
•
arepresentationstepwhereallprototypesareupdatedaordingtothe new assignment. For the on-line version of the algorithm, this step isperformed by mimikinga stohasti gradient desent sheme:
p
newu = poldu + µH (d (f (x i ), u))
x i − p
oldu
,
(1)where
H
is the neighborhood funtion verifying the assumptionsH : R + → R +, H(0) = 1
and lim x→+∞ H(x) = 0
, and µ
is a training
parameter. Generally,
H
andµ
are supposedtobedereasingwith thenumberof iterationsduring the trainingproedure.
TheoriginalSOM algorithmdesribed abovedoesnotpossesaost fun-
tion and is not exatly a gradient desent, at least not in the ontinuous
ase. However, when the size of the neighborhood isxed and with a modi-
edassignmentstep,[28℄provedthatSOMisminimizingthefollowingenergy
funtion:
E ((p u ) u ) = X U u=1
Z
δ u,f(x)
X U l=1
H(d(l, u))kx − p l k 2 P (dx) ,
where
δ u,f(x) =
1,
iff (x) = u 0,
otherwise .2.1. SOM for dissimilarity data
Intheasewheretheinputdatatakevaluesinanarbitraryinputspae
G
,a natural Eulidean struture is not neessarily assoiated with
G
. Instead,thedissemblanebetweentheobservationsanbedesribedbyadissimilarity
measure
∆ = (δ ij ) i,j=1,...,n suh that ∆
is non negative (δ ij ≥ 0
), symmetri
(
δ ij = δ ji) and null on the diagonal (δ ii = 0
). In this ase however, the
assignment step annot be arried out straightforwardly sine the distanes
between the input data and the prototypes are not bediretly omputable.
Severalextensions ofthe SOM algorithmhavebeen proposed inthis on-
text: [8℄proposesthemedianSOM wheretheprototypesarehosenamong
theinputdata
(x i ) iinabathframework. Theassignmentstepisthensimilar
to the Eulidean framework, with the dissimilarity replaing the Eulidean
norm. The representation step simply nds the prototypes that minimize
the energyof themap by anexhaustivesearh amongtheinputdata. [9,29℄
extend this work by using several observations instead of a unique one for
eah prototype and by proposing a fast implementation of the algorithm.
Nevertheless, hoosing the prototypes among the input data is very restri-
tiveand using several observations foreahprototype strongly inreases the
omputationaltime needed totrain the map.
Tooveromethis diulty, the solutionproposed by [18,19,20,21℄isto
rely on the pseudo-eulidean framework: indeed, [22℄ pointed out that any
data desribed by a symmetri dissimilarity matrix an be embedded in a
spae onsisting of the orthogonal diret sum of two Eulidean spaes, for
whih the inner produt operation is denitepositiveon the rst spae and
denite negative on the seond. Relying on this framework, and similarly
to the kernel SOM approah [19℄, the prototypes are supposed to be sym-
bolionvexombinationsoftheoriginaldata(atually,onvexombinations
of their impliit embedding in the pseudo-eulidean spae):
p u ∼ P
i β ui x i
with
P
i β ui = 1 and β ui ≥ 0
2. If β u
denotes the vetor (β u1 , . . . , β un )
, thedistane inthe assignment step an bewritten interms of
∆
andβ u only:
δ(x i , p u ) ≡ ∆ i β u − 1
2 β u T ∆ β u .
(2)where
∆ i is the i
-th row of ∆
(the formula is justied and proved in
Appendix A). This algorithm, alled relational SOM, was proposed in the
bathframework wheretherepresentationstep onsistsinupdatingthe on-
vex ombinationby a mean alulation:
β ui = H(d(f (x i ), u)) P
i ′ H(d(f (x i ′ ), u)) .
This approah is very similar to the bath kernel SOM desribed in [12,
13℄. In kernel SOM, the Eulidean framework is justied by the denition
of a kernel
K : G × G → R
that impliitly maps the data into a Hilbertspae where the inner produt is diretly available via the kernel. Atually,
bathkernelSOMandbathrelationalSOMareequivalentforadissimilarity
dened fromthe kernel by:
δ(x i , x j ) := K(x i , x i ) + K(x j , x j ) − 2K(x i , x j ).
(3)Reiproally,ifthedissimilaritymatrix
∆
anbeembeddedinaEulideanspae (i.e., if it fullls the ondition given in [15, 16, 17℄ whih is that the
matrixwithelements
s ij = (δ(x i , x n ) 2 + δ(x j , x n ) 2 − δ(x i , x j ) 2 ) /2
ispositive,or, similarly,if the matrix with elements
s(i, j ) = − 1
2 δ 2 (x i , x j ) − 1 n
X n k=1
δ 2 (x i , x k ) − 1 n
X n k=1
δ 2 (x k , x j ) + 1 n 2
X n k,k ′ =1
δ 2 (x k , x k ′ )
!
as proposed in [30℄, ispositive), then relationalSOM is equivalent tokernel
SOM used with the matrix
(s ij ) ij, whih, in this ase, isa kernel. However,
as explained in[3℄, some useful dissimilarities (e.g., shortest path lengthsin
graphs or optimal mathing dissimilarities for sequenes of events, [31, 32℄)
2
Note that this sum has no real meaning, most of the times, as
G
is not neessarilyequippedwitha
+
operationneitherwithamultipliationbyasalar. Itsimplyimpliitly referstothe+
operationin theunderlyingpseudo-eulideanspae: theformaldenition ofp u
isgiveninAppendix A.Eulidean spae. In theseases, thedissimilarityan beturnedintoakernel
using various pre-proessings, as desribed in [33℄ but then, relational SOM
and kernel SOM are nolonger idential.
2.2. On-line relational SOM
Asexplainedin[23℄,althoughbathSOMpossessesthenie propertiesof
beingdeterministiandofusuallyonverginginafewiterations,ithasseveral
drawbaks suh as organizing the map rather poorly, produing unbalaned
lasses and being strongly dependent on the initialization. Hene, using
the same ideas as [18, 20℄, we introdue the on-line relational SOM, whih
generalizes the on-lineSOM tothe ase of dissimilarity data. The proposed
methodisdesribedinAlgorithm1. In thisalgorithm,onlyone observation,
Algorithm 1 On-linerelationalSOM
1: Forall
u = 1, . . . , U
andi = 1, . . . , n,
initializeβ ui 0 suh thatβ ui 0 ≥ 0
and
P n
i β ui 0 = 1.
2: for t=1,...,T do
3: Randomly hoose aninput
x i
4: Assignment step: nd the unit of the losest prototype
f t (x i ) ← arg min
u=1,...,U
β u t−1 ∆
i − 1
2 (β u t−1 ) T ∆ β u t−1
5: Representation step:
∀ u = 1, . . . , U
,β u t ← β u t−1 + µ(t)H t (d(f t (x i ), u)) 1 i − β u t−1
where
1 i is a vetor with asingle non nulloeientat the i
thposition,
equal to one.
6: end for
randomlyhosen,isassignedtoaunitofthemapateahiterationstep. The
representation step isdrawn fromEquation (1) by using a similarapproah
to update the prototypes' oordinates
(β ui ) ui. Note that the onstraints on
(β ui ) ui are preserved sine:
• P
i β ui t = 1 (as demonstrated inAppendix B);
• β ui t ≥ 0
for anyu
andi
as long asµ(t)
is small enough (µ(t)H t must
simplybesmaller than 1).
This latter ondition is easy enough to handle. In our experiments, the
parameters of the algorithmare hosen aording to[34℄: the neighborhood
H t dereases in a pieewise linear way, starting from a neighborhood whih orresponds tothe wholegrid up toaneighborhoodrestrited totheneuron
itself;
µ(t)
vanishes at the rate of1/t
.2.3. Disussion on thealgorithm: relationsto previous algorithms, omplex-
ity and onvergene
Ifthe dissimilarity matrix is a Eulidean distane, then the on-line rela-
tionalSOMisexatlyidentialtothestandardnumerialSOMaslongasthe
prototypesof theoriginalSOM are initializedinthe onvex hullofthe origi-
naldata(i.e.,theinitialprototypesanbewritten
p 0 u = P
i β ui 0 x i
). Similarly,the on-linerelationalSOM isidential toon-linekernel SOMasdesribedin
[10, 11, 12℄for a dissimilarity dened from a kernel
K
by Equation (3)or ifthe dissimilarityfulllsone of the onditions in[15,16, 17℄.
Moreover, if one wantsto generalize dissimilarities to non-symmetri re-
lations(suhas,forexample,graph-basedomparisonsofproteinngerprint
graphs),adissimilaritymatrixomputedasthehalf-sumofpairwiserelations
may beonsidered as the input for the algorithm.
Inordertoillustratethe performanesoftheon-linerelationalSOMom-
pared tothe bath implementation,500 points are onsidered, sampledran-
domlyfromthe uniformdistributionin
[0, 1] 2. Thedissimilarityisomputed as the length of the shortest path in the graph indued by the Delaunay
triangulation (this graph is displayed in Figure 1). Note that this dissim-
ilarity is not exatly equivalent to the Eulidean
R 2-metri, sine it is not
even Eulidean. ThebathversionofrelationalSOM andthe on-lineversion
of relational SOM were trained on idential
10 × 10
grid strutures. Thealgorithms were trained either with idential initializations, or with a PCA
initialization 3
forthe bathSOM,whihisthestandardinitializationusedto
alleviatetheinitializationdependenyofthisalgorithm. Resultsareavailable
in Figure 1 and learly show a muh better organization of the prototypes
in the nal grid provided by the on-line version of the algorithm. When the
3
Dissimilarity PCA was used and then properly re-saled to satisfy the ondition
P
i β ui = 1
.0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Figure 1: 500pointssampled fromtheuniform distributionin
[0, 1] 2
andtheirDelaunaygraph (top left) and map organizations obtained by relational on-line SOM (top right)
andrelationalbathSOM(bottomleft)withrandominitializationandbyrelationalbath
SOMwithdissimilarity-PCAinitialization(bottom right).
prototypesareinitializedwithaPCA,the organizationof themapprodued
bythe bathkernel SOMis muhbetterbut stillslightlyworse thanthe one
obtained with the on-line version and a random initialization. This visual
eet is onrmed when alulating the topographi error [35℄: this error
quanties the ontinuity of the map with respet to the input spae metri
by ounting the number of times the seond best mathing unit of a given
observationbelongstothe diretneighborhoodofthe best mathingunitfor
this observation. A topographi error equalto 0 means that allseond best
mathing units are in the diret neighborhood of the winner neurons and
thus that the originaltopology ofthe data is well preserved onthe map. In
this simple example, it is equal to 0.01 for the on-line relational SOM, to
0.176 for the bathrelationalSOM with PCAinitializationand to0.264 for
the bath relational SOM with random initialization. Hene, the lassial
initialization dependeny of the bath version of the algorithm, as already
no goodinitializationis present (i.e., when PCA or MDS are bad initializa-
tion strategies), the on-line version an then bevery beneial. Finally, the
omplexity of the on-line and bath versions are similar (of order
O(Un 2 )
)withusually asmallernumberofiterationsneeded tostabilizethebathver-
sion: the onvergene of the bath version is attained with quadrati speed
while the on-line version onverges with a linear speed. However, the better
organization of the map ompensates for this small loss in omputational
time. Finally,letusremarkthat formallyspeaking,inthe pseudo-Eulidean
setting, the onvergene of both algorithms (on-line and bath) is even not
guaranteed(saddlepointsan bepresent insteadof loaloptima,as pointed
out in [21℄) but, in pratialappliations,divergene was neverobserved.
3. Integrating multiple dissimilarities
Insome spei appliations,the user isinterested insimultaneously an-
alyzing several soures of information: a graph together with additional in-
formation known on its nodes, numerial variables measured on individuals
together with fators desribing these individuals... This situation is often
referred to as multiple view data and suh data are quite ommon in a
numberofelds: gene lusteringfromexpression prolesandontologyinfor-
mation[36℄inbiology,nodelusteringinasoialnetworktakingintoaount
attributes that desribe the nodes [37, 1℄ in soial sienes, and moleules
lustering fromngerprintsand spatialstrutures[38℄inhemistry. Inother
spei appliations,the data set an bedesribed byseveral dissimilarities,
eah enoding spei features of the data but none of them being aknowl-
edged as more informative than the others: in soial sienes for instane,
the hoie of a good dissimilarity to desribe the resemblane between two
event time series isstill anopen issue [4, 5℄.
Theombinationofallsouresofinformationorofseveraldissimilaritiesis
ahallenging problemthataimsatinreasingthe relevaneof thelustering.
In lustering, this issue has already been takled by dierent approahes:
some rely on lustering ensembles, ombining together the lusterings ob-
tained from eah view orfrom eahdissimilarityinto a onsensus lustering
[39℄. A more omplexstrategy, desribed in[40℄,iterativelyupdates the dif-
ferent lusterings using a global log-likelihoodapproah until they onverge
to a onsensus. Other authors propose to onatenate all data/views prior
to the lustering. If kernels are available, this method is known as multi-
ombinationandthe oeientsoftheonvexombinationare optimizedto-
getherwiththe lustering[41,42℄. Inasimilarway, ifthe dataaredesribed
by numerial variables belonging to dierent feature groups, [43℄ proposes
to weight eah group and tooptimizesimultaneously the lustering and the
weights of the groups.
For the SOM algorithm as well, a few artiles takle related issues: in
partiular, [44℄ ombines numeri and binary variables to produe a single
map by optimizing two quantization energies in parallel and [1, 2℄ use a
multiplekernel framework to integratevarious information.
In the present setion, we use asimilar approah by ombining dierent
dissimilarities in a onvex ombination. We propose an algorithm whih
learns an optimalombinationon-line,by minimizingthe energy funtion.
3.1. Computing a multiple dissimilarity
Suppose now that the observations
x 1, ..., x n are not desribed by a
single dissimilarity matrix
∆
,but byD
dissimilarity matries∆ 1, ..., ∆ D,
where
∆ d = δ d (x i , x j )
ij
. The dissimilaritiesan beeither dierentdissimi- laritiesomputedonthesamedataordissimilaritiesomputedfromdierentvariables measured on the same individuals (e.g., a dissimilarity that mea-
sures proximitiesbetweennodes inagraphandadissimilaritythatmeasures
proximities between the node labels, see Setion 4.3for anexample).
Similarly to the multiple kernel approah desribed in [45℄ or in [1℄ (for
multiple kernel SOM), we propose to ombine all the dissimilarities into a
single one, dened asa onvex ombination:
δ ij α = X D d=1
α d δ ij d (4)
where
α d ≥ 0
andP D
d=1 α d = 1. IntheEulideanframework,thisapproahis
stritlyequivalenttothemultiplekernelSOM approahbeause
kx i − x j k 2 d = hx i − x j , x i − x j i d (multiple kernel is a onvex ombination of dot produts
whereasEquation(4)isbasedonaonvexombinationofsquareddistanes).
Ifthe
(α d )
aregiven,relationalSOMbasedonthedissimilarityintrodued inEquation(4)aimsatminimizing(over(β u ) u)thefollowingenergyfuntion
E ((β u ) u , (α d ) d ) = X U u=1
X n i=1
H (d (f (x i ), u)) δ α (x i , p u (β u )) ,
where
δ α (x i , p u (β u ))
is dened as in Equation(2) byδ α (x i , p u (β u )) ≡ ∆ α i β u − 1
2 β u T ∆ α β u (5)
with
∆ α = P
d α d ∆ d
When there is no a-priori on the
(α d ) d, we propose to inlude the opti-
mizationoftheonvexombinationwithintheon-linealgorithmwhihtrains
themap. Thisideaissimilartothe oneproposedin[46℄foroptimizingaker-
nelparameterinvetorquantizationalgorithms. More preisely,astohasti
gradient desent step is added to the original on-line relational SOM algo-
rithm to optimize the energy
E ((β ui ) ui , (α d ) d )
, over both(β ui ) ji and (α d ) d.
Toperformthestohastigradientdesentsteponthe
(α d )
,theomputationof the derivativeof
E| x i = X U
u=1
H (d (f (x i ), u)) δ α (x i , p u (β u ))
(the ontribution of the randomly hosen observation
(x i ) i to the energy)
with respet to
α
is needed. Sine∂
∂α d
[δ α (x i , p u )] = δ d (x i , p u ),
we have
D id = ∂E| x i
∂α d
= X U u=1
H (d (f (x i ), u))
∆ d i β u − 1
2 β u T ∆ d β u
.
Followinganidea similarto thatof [45℄,the SOM istrained by perform-
ing, alternatively,the standard stepsof the SOMalgorithm(i.e., assignment
and representation steps) and a gradient desent step for the
(α i ) i. The
methodology isdesribed inAlgorithm 2.
1: Forall
u = 1, . . . , U
andi = 1, . . . , n,
initializeβ ui 0 suh thatβ ui 0 ≥ 0
and
P n
i=1 β ui 0 = 1.
2: For all
d = 1, . . . , D
, initializeα 0 d ∈ [0, 1]
stP
d α 0 d = 1. return δ α,0 ← P
d α 0 d δ d
.3: for t=1,...,T do
4: Randomly hoose aninput
x i
5: Assignment step: nd the unit of the losest prototype
f t (x i ) ← arg min
u=1,...,U δ α,t−1 (x i , p u (β u ))
where
δ α,t−1 (x i , p u (β u ))
isdened asin Equation (5).6: Representation step: update allprototypes aording to the new as-
signment:
∀ u = 1, . . . , U
,β u t ← β u t−1 + µ(t)H (d (f (x i ), u)) 1 i − β u t−1
7: Gradient desent step: update the onvex ombination parameters:
∀ d = 1, . . . , D
,α t d ← α t−1 d + ν(t)D d t
where
D t d isthe desent diretionand update δ α,t δ α,t ← X
d
α d t δ d .
8: end for
To ensure that the gradient step respets the onstraints on
α
(α d ≥ 0
and
P
d α d = 1), the followingstrategy is used: similarlyto [47, 48,45℄, the
gradient
∂E t− 1
| xi
∂α d
d
isredued and projetedsuhthatthe non-negativityof
α
is ensured. The followingmodieddesent step is thus used:D e d =
0
ifα d = 0
andD d − D d 0 > 0
−D d + D d 0 if α d > 0
and d 6= d 0 P
d6=d 0 , α d >0 (D d − D d 0 ) otherwise
The desent step
ν(t)
is dereased with the standard rate ofν 0 /t
with aninitial
ν 0 smallenoughtoensure the positivity onstraint on(α d ) d.
In this setion, several appliations, on simulated or real-life data sets,
illustrate the performanes of the proposed methods. Setion 4.1 ompares
on-line and bath relationalSOM on aDNAbaroding dataset, Setion 4.2
ompares the use of dissimilarities and kernels for mapping two politial
graphsintoagrid,Setion4.3illustratestheeienyoftheuseofamultiple
dissimilarityapproahonasimulateddatasetand,nally,Setion4.4applies
the multiplerelationalSOM toalargedataset ofategorialtimeseries and
shows that the multiple relational SOM approah an be used to interpret
whih dissimilaritiesprodue the most relevantlusters.
4.1. Comparisonbetweenon-lineandbathrelational SOMonagenetidata
set
This rst experiment aims at providing a omparison between on-line
and bath relationalSOM. It is performed on a data set that ontains 465
inputdataissuedfromtenunbalanedsampledspeiesofAmazonianbutter-
ies. This data set was previously used by [49℄ to demonstrate the synergy
between DNA baroding and morphologial-diversity studies. The notion
of DNA baroding omprises a wide family of moleular and bioinformatis
methods aimed at identifying biologial speimens and assigning them to a
speies. Aording to the vast literature published duringthe past years on
the topi, two separate tasks emerge for DNA baroding: on the one hand,
assign unknown observations to known speies and, onthe other hand, dis-
over undesribed speies, [50℄. The seond task isusually approahed with
the Neighbor Joining algorithm [51℄ whih onstruts a tree similar to a
dendrogram. When the sample size is large, the trees beome rapidly un-
readable. Moreover, they are quite sensitiveto the order inwhih the input
data are presented. Unsupervised learning and visualization methods are
used to a very limited extent by the DNA baroding ommunity, although
theinformationtheybringmaybequiteuseful. Self-organizingmapsprovide
a visualization of the data while bringing out lusters or groups of lusters
that may orrespond toyetunknown speies.
DNA baroding data are omposed of sequenes of nuleotides, i.e. se-
quenesofa,,g,t lettersinhighdimension(hundredsorthousandsof
sites). Hene, sine the data are not Eulidean, dissimilarity-basedmethods
appear tobemoreappropriate. Speidistanes anddissimilaritiessuhas
the Kimura-2P[52℄areusually omputed. Reently,bathmedianSOMwas
Although median SOM provided enouraging results, two main drawbaks
emerged. First,sinethe algorithmwasruninbath,the organizationofthe
map was generally poor and highlydependingon the initialization. Seond,
sine the algorithm alulates a prototype for eah luster among the data
set, it does not allow for empty lusters. Thus, the existene of speies or
groups of speies was diult to aknowledge. The use of on-line relational
SOM overomes these two issues. Figure2ontains the maps obtained with
median SOM and relational SOM with PCA initialization, both trained in
bath versions 4
and Figure3 illustrates the mappingprodued with the on-
linerelationalSOM. Thethreealgorithmswere runwithidentialxedseeds
for the random generators. The lustering quality of median SOM is poor,
sineseverallustersmixtogetherseveralspeies. Ontheontrary,relational
SOM allows for empty lusters and thus produes a better mapping, from
a lustering point of view: the only mixing lass orresponds to a labeling
error. Moreover, the empty ells help separating the main groupsof speies.
Clustering may thus be useful in addressing misidentiationissues.
Topographi errors were omputed for the three mappings in order to
assess the quality of the projetion. For the online algorithm, the error is
0.0022,forrelationalSOMwithPCAinitializationweobtained 0.3682,while
the error of median SOM is 0.3094. Hene, the stohastiity of the on-line
algorithmallowedforabetterorganizationofthemap, omparedwithbath
algorithms.
In Figure 3b, distanes with respet to the nearest neighbors were om-
puted for eah node. The distane between two nodes/ells is omputed as
themeandissimilaritybetweentheobservationswithineahlass. Apolygon
is drawn within eah ell with verties proportional to the distanes to its
neighbors. Iftwoneighborprototypesare very lose,then theorresponding
verties are very lose to the edges of the two ells. If the distane between
neighbor prototypes is very large, then the orresponding verties are far
apart, lose to the enter of the ells.
4
relationalbathSOMwithrandominitializationwasalsotestedbut,sinetheresults
were worse than the ones obtained with PCA initialization, they are not shown in this
artile.
Figure 2: Speies diversitydistributionbyluster (radiusproportionalto thesize ofthe
luster): MedianbathSOM(a)and RelationalbathSOMwithPCAinitialization(b).
(a) (b)
Figure 3: On-linerelationalSOMresultsfor Amazonianbutteries: (a)Speiesdiversity
distributionby luster (radiusis proportionalto theluster size. (b) Distanesbetween
prototypes.
4.2. On-line relational SOM and on-line kernel SOM to deipher the stru-
ture of politial networks
Thispresentsetion'spurposeistogiveaomparisonoftheperformanes
obtained with relational SOM when used with various metris. More pre-
isely, we will show that for strutural data suh as graphs, a kernel is not
alwaysthemostrelevantwaytoextratinformationfromthegraphstruture,
ompared to, i.e., the simple similarity based on the length of the shortest
path between two nodes.
Thedata usedinthis setionome fromtwofamous data sets pertaining
to the US politis. The rst data set is a graph where the nodes are 105
Amerian politial books, all published around the presidential eletion of
two books were o-purhased by a ommon buyer 5
. All nodes are labeled
aording to their politial aliation (onservative, liberal or neutral), and
this informationwillbe used tovalidate the results a posteriori.
The seonddata set is agraph representing the US politisblogosphere,
reorded in 2004, for the same presidential eletion as the previous one, by
Adami and Glane [54℄. This data set ontains 1 222 nodes whih are po-
litialblogsand 16714edges thatrepresenta hyperlinkbetween twoblogs 6
.
Again,additionalinformationpertainingthepolitialprefereneoftheblogis
alsoprovided(hereonlyonservativeorliberal). Bothgraphsarerepresented
inFigure 4by aFruhterman andReingold [55℄foredireted plaement al-
gorithm and nodes are olored aording to the politial aliation of the
book orof the blog.
Figure 4: Politialbooks (left)and blogs (right)networks. Nodesare labeled aording
to the politial orientation of thebook orof the blog: pinkis for onservative, blue for
liberalandgreenforneutral.
Relational SOM was performed to projet the nodes of the two graphs
ona square grid havingdimension
5 × 5
(books)and10 × 10
(blogs). Threedierentdissimilarities were used toperform this task:
5
This graph was built by Valdis Krebs and is available for downloading at
http://www-personal.umih.edu/~mejn/netdata/polbooks.zip.
6
The original graph was direted but weonly used undireted edges to perform our
analysis.
•
the length of the shortest path between two nodes. Notethat, in gen-eral, the length of the shortest path is not a Eulidean distane: for
the two graphs desribed in this setion, the ondition of [16℄ is not
satised;
•
a dissimilarity dened as the square of the distane indued by the heatkernel(K = e −γL whereL
istheLaplaian,[56℄),withparameters
γ = 0.1
and1. Inthisase,relationalSOMisequivalenttokernelSOM
as desribed in [18, 20℄;
•
a dissimilarity dened as the square of the distane indued by the ommute time kernel [57℄.The performanes of the tested methods were assessed using three riteria:
the modularity of the obtained partition, the neurons' purity (ompared to
the politial labels) and the topographierror of the map. The modularity
[58℄is a measure of quality of a partitionof the nodes in a graph:
Q = X U u=1
X
i,j:f (x i )=f (x j )=u
E ij − d i d j
2m
where
E ij = 1
i there is an edge between nodesx i and x j, d i is the degree
d i is the degree
of node
x i and m
is the number of edges in the graph. The best partition
orresponds to the largest modularity. The neurons' purity is a measure
of the onsisteny of the lustering with respet to the politial labels: it
ounts the frequeny of the politial labels of the nodes that are equal to
the majority politial label of the node's luster. The loser to 1 the purity
is, the better the lustering is. The last quality riterium, the topographi
error of the map [35℄, quanties the ontinuity of the map, with respet to
the input-spaemetriasalready explainedinSetion2.3. Notie that,asit
omputes the seond best mathing unit, the topographi error depends on
the metriof theinputspaeitselfandtellsusifthismetriiswellpreserved
on the map.
The results are given in Table 1. In addition, Figures 5 (books) and 6
(blogs) display two of the maps obtained for eah data set. First note that
the modularity obtained with the SOM algorithm should not be ompared
with that of a standard node lustering algorithm: the number of lusters
usedinsuhmapsisoftenmuhlargerthantheoptimalnumberoflustersfor
the modularity(forinstane, theoptimalmodularityfound bythe algorithm
length
γ = 0.1 γ = 1
kernelPolitial books
modularity 0.25 -0.05 0.08 0.27
purity 0.88 0.61 0.72 0.89
topo. error 0.048 0.133 0.038 0.038
Politialblogs
modularity 0.08 0.02 0.00 0.00
purity 0.93 0.89 0.79 0.57
topo error 0.303 0.047 0.322 0.899
Table 1: Modularity, neurons' purity and topographi error obtained for the data sets
politialbooks andpolitialblogsbyrelationalandkernelSOMalgorithms.
desribed in [59℄ givesonly 10 lusters, that should be ompared tothe 100
lusters of the map). Nevertheless, this measure of the lustering quality is
still validfor omparing dierent dissimilarities.
For the politial books data set, the best map is obtained by using the
on-line kernel SOM algorithm with the ommute time kernel. The on-line
relationalSOM withthe shortest path dissimilarityobtainsomparableper-
formane but the heat kernel gives poor results, whatever the value of
γ
.For the politial blogs, the on-line relational SOM gives good results and
manages to disriminate the two groups of blogs quite well, while its topo-
graphi error is rather large. On the other hand, the kernel SOM with the
heat kernel (
γ = 0.1
) has a muh better topographi error, while it badlydisriminates the labels and produes a bad lustering of the nodes of the
graph on the map (as measured by the modularity): this an be explained
by the fat that, even though the map properly represents the topographi
organization of the input spae, the metri used to represent the data may
not be the most aurate to emphasize some partiular features of the data
that an beof a major interest for the user.
In a seond step, a hierarhial lustering of the prototypes was per-
formed. Using the symboli representation of the prototypes as
p u ∼ P n
i=1 β ui x i
, the dissimilarity between two prototypes an be expressed as:δ(p u , p u ′ ) := − 1
2 (β u − β u ′ ) T ∆ (β u − β u ′ )
(6)and used asan input in the hierarhial lustering algorithm (see [20℄, The-
the shortest path dissimilarity(left) and by the on-line kernel SOMwith the ommute
timekernel(right).
orem 1,for ajustiationofthis formula). Onlythree andtwolusterswere
kept for, respetively, the politial blogs and the politial books data sets
in order to try toretrieve the original labels. The resultinglusters are dis-
played in Figures7 and 8. Moreover, the lasses' purity and modularity are
given in Table 2.
Dissimilarity Shortest path Heat kernel Heat kernel Commute time
length
γ = 0.1 γ = 1
kernelPolitial books
modularity 0.50 -0.02 -0.00 0.41
lasses' purity 0.84 0.49 0.47 0.76
Politialblogs
modularity 0.39 0.04 -0.00 0.00
lasses' purity 0.91 0.52 0.58 0.52
Table2: Modularityandpurityofthelassesobtainedbyahierarhiallusteringofthe
prototypes,forthedatasets politialbooks and politialblogs.
Asexpeted,hierarhiallusteringtendstoslightlyderease thelasses'
purity (ompared to the neurons' purity) and to strongly inrease the mod-
ularity. But italsoaets whih ofthe dissimilarities seemstorepresent the
data better: for both data sets, the shortest path dissimilarity overomes
the dissimilarities based on kernels. This shows that the use of a kernel is
theshortestpathdissimilarity(left)andbytheon-linekernelSOMwith theheatkernel
γ = 0.1
(right).not alwaysthe best possible hoie for omputing similarities/dissimilarities
between observations and thatallowingthe use ofa largerfamily ofdissimi-
larities an be useful in some ases.
4.3. Multiple relational SOM on simulated data
In this setion, a simple example is used to test the algorithm and il-
lustrate its behavior in the presene of omplementary information. 200
observations, divided into 8 groups (indexed from 1 to 8 in the following),
were generated using threedierent typesof data:
•
an unweighted graph, simulated similarly as the planted3
-partitiongraph desribed in[60℄. Thenodes ofthe groups1to4and the nodes
of the groups 5to8 ould not be distinguishedinthe graph struture:
the edgeswithinthesetwo setsofnodeswere randomlygeneratedwith
a probability equal to
0.3
. The edges between these two sets of nodeswere randomlygenerated with a probabilityequal to
0.01
;•
numerialdata that were two dimensionalGaussianvetors. Thevari- ables orresponding to observations of odd groups were simulated byGaussianvetorswithmean
(0, 0)
and independentomponentshaving avarianeequalto0.3
andthe variablesorrespondingtoobservations of even groups were simulated by Gaussian vetors with mean(1, 1)
and independent omponentshaving a varianeequal to
0.3
;the shortest path dissimilarity(left) and by the on-line kernel SOMwith the ommute
timekernel(right).
•
a fator with 2 levels. Observations of groups 1, 2, 5, and 7 were aeted to the rst level and observations of the other groups to theseond level.
Hene, only the ombined knowledge of the three data sets gave aess
to the eight original groups. The multiple relational SOM algorithm was
applied to this problem with the shortest path distane for the graph, the
standard Eulidean distane for the numerial data and Die's distane for
the fator variable (equal to 0 if the fators are idential between the two
observationsand to1 if not). The algorithmwasompared with
•
a multiplekernel SOM approah as desribed in [2℄ where the kernelsusedwere theommutetimekernel[57℄forthe graphandthe Gaussian
kernelforboththeotherdatasets(thefatorwasreodedasanumeri
variable using its disjuntive form). The parameter of the Gaussian
kernel was set as reommended in[61℄;
•
a standard relational SOM approah using one of the three data setsonly. It was also ompared to the dissimilarity SOM using numerial
andfatordataorallthethreedatasetsbutusedasifthey wereissued
fromthesame dataset withaEulideandistane (whenthegraphwas
added to the numerial and fator data, it was under the form of its
adjaeny matrix).
theshortestpathdissimilarity(left)andbytheon-linekernelSOMwith theheatkernel
γ = 0.1
(right).The omparison was performed on 100 dierent data sets generated as pre-
viously desribed.
The performanes of the dierent approahes were ompared using the
normalized mutual information[62℄ with respet to the originallasses, the
average node purity,taking again asa referenethe originallasses, and the
topographi error [35℄. The rst two quality measures quantify the adequa-
tion between the original lasses and the lustering provided by the SOM.
The node purity has values between 0and 1 and is equalto1 when the two
partitionsareidential. Thelastqualitymeasure,thetopographierror,does
not depend onthe original lass but itquanties the ontinuity of the map,
with respet to the input spae metri. The results are given in Figure 9,
whih displays the distributions of the normalized mutual information, the
nodes' purity and the topographi error, over the 100 data sets. Figure 10
provides examplesof resulting maps.
pographierror(bottom)ofmultiplerelationalSOM,multiplekernelSOMandrelational
SOMusedwithallortwodatasets(num&fa)simplymergedinasingledatasetorwith
asingledataset (graph,numeriorfator).
Taking into aount the lustering quality (normalized mutual informa-
tion), the node purity and the topographi quality, the multiple relational
SOM outperforms the othermethods. The dierenebetween the use of the
shortest-pathdissimilarityandakernelforgraphinasimilarmultiplesetting
is small but still signiant (with
p
-values smaller than10 −9 for Wiloxon
paired tests).
sion of the results beauseit penalizes the fatthat the originallusters are
separated into several neurons on the grid. This explains the good perfor-
mane(despitetheirlargevariability),intermofnormalizedmutualinforma-
tion, ofthegrid builtfromthe numerivariablesandthe fatoronlybeause
this latter map ontains muh more empty lusters as shown in Figure 10.
Onthe ontrary,the exampleofthemapresultingfromon-linemultiplerela-
tionalSOM inFigure10shows agoodlassiationandagoodorganization
aording to the three types of information: the eight groups are almost
perfetlydistinguishedby the algorithm.
Also note that the topographi error is not an optimal way to ompare
the results obtained with data sets that do not ontain the same amount of
information: indeed, the very good topographi error obtained by the map
trained from the numeri data only or the fator only simply means that
the topographi properties of these data is well preserved on the map but
this annot be ompared to the multiple relation SOM, the multiple kernel
SOM orthe map trainedwith alldata anda standard SOM:these maps are
supposed to preserve the topographi properties of all three sorts of data,
whihis aharder taskthan preserving the topographiproperty ofonly one
sort(numeri,fator,graph)ofdata. Inthisase,mergingalldatainasingle
data set whih isthen passed asaninput toa numeriSOM leads toavery
bad topographierror(approximately30times larger thanthe one obtained
with multiplerelationalSOM or multiple kernel SOM).
Theevolutionof the
α
,shown inFigure 10isalsointeresting: the Die's distane,whihistheonlysimilaritymeasurebasedonanonnoisysetofdataobtains larger weights than the other two dissimilarities. This is onsistent
with the fat that these data are indeed the best of the three data sets to
distinguish between the original lusters: as shown in Figure 9, the map
basedonthefator isbetterinterms ofnormalizedmutualinformationthan
theones basedonthenumerivariablesoronthegraphonly(itsnodepurity
isverylowbeauseitperfetlydistinguishedthedataintotwolusterswhere
four originallusters are equally mixed).
α
evolution multiplerSOMrSOM (simple)rSOM
with numeri variablesand fator with all three datasets
Figure 10: Summary of the experiment: the original graph and the original distri-
bution ofthenumerialvariablesisgivenatthetopofthegure;multiple rSOMresults
(seond row)withtheevolutionof the
α
andtheresultingmap (diskshaveanareapro-portionalto thenumberof observations andare oloredaordingto thedistribution of
the originallassesin theorrespondingneuron);bottom: maps obtainedusingnumeri
variablesandfatormergedinasingledatasetandasimpleEulideandistane(left)and
usingallthreedatasets butasimpleEulideandistane.
The last example illustrates multiple relational SOM on data related to
shool-to-worktransitions. Weused thedatainthesurvey Generation98 7
.
Aording to the Frenh National Institute of Statistis (INSEE), 22.7% of
young people under 25 were unemployed at the end of the rst semester
2012.
8
Hene, it is ruial to understand how the transition from shool to
employmentor unemployment is ahieved, in the urrent eonomi ontext.
The dataset ontains informationon16040 youngpeoplehavinggraduated
in1998and monitoredduring94monthsafterhavingleftshool. The labor-
market statuses have nine ategories, labeled as follows: permanent-labor
ontrat, xed-term ontrat, apprentieship ontrat, publi temporary-
labor ontrat, on-all ontrat, unemployed, inative, militaryservie, ed-
uation. The following stylized fats are highlighted by a rst desriptive
analysis of the data asshown in Figure11:
•
permanent-laborontratsrepresentmorethan20%ofallstatusesafter oneyearandtheirratioontinuestoinreaseuntil50%afterthreeyearsand almost 75% after seven years;
•
the ratio of xed-terms ontrats is more than 20% after one year onthelabormarket,but itisdereasingto15%afterthreeyears andthen
seems to onverge to 8%;
•
almost30%oftheyounggraduatesareunemployedafteroneyear. Thisratio is dereasing and beomes onstant,10%, after the fourth year.
The dissimilarities between sequenes were omputed using optimal math-
ing (OM). Also known as edit distane or Levenshtein distane, optimal
mathing was rst introdued in biology by [31℄ and used for aligning and
omparingsequenes. Insoialsienes,therstappliationsaredueto[32℄.
The underlying idea of optimal mathing is to transform one sequene into
anotherusingthreepossibleoperations: insertion,deletionandsubstitution.
Aostisassoiatedtoeahofthethreeoperations. Thedissimilaritybetween
7
available thanks to Génération1998 á 7 ans - 2005, [produer℄ CEREQ, [diusion℄
CentreMaurieHalbwahs(CMH)
8
All omputations were performed with the free statistial software environment R
(http://ran.r-projet.org/,[25℄). Thegraphialillustrationswerearriedoutusing
theTraMineRpakage[63℄.
sequenes is then omputed as the ost assoiated to the smallest number
of operations whih allows to transformthe sequenes into eah other. The
methodseemssimple andrelativelyintuitive,butthe hoie of theosts isa
deliateoperationinsoialsienes. Thistopiissubjettolivelydebatesin
the literature [4, 5℄ mostly beause of the diulties toestablish anexpliit
and sound theoretial frame.
Inourappliation,allareerpathshavethesamelength,thestatusofthe
graduatestudentsbeingobserved during94months. Hene,wesuppose that
there arenoinsertions ordeletionsandthat onlythe substitutionostshave
to be dened for OM metris. Among optimal-mathing dissimilarities, we
seleted threedissimilarities: the OMwithsubstitutionostsomputed from
the transition matrix between statuses as proposed in [64℄, the Hamming
dissimilarity (HAM, no insertion or deletion osts and a substitution ost
equal to 1) and the Dynami Hamming dissimilarity (DHD as desribed in
[65℄).
In order to identify the role of the dierent dissimilarities in extrating
typologies, we onsidered several samples drawn at random from the data.
For eah of the experiments below, 50 samples ontaining 1 000 input se-
queneseahwereonsidered. Inordertoassessthequality ofthe maps,two
indexes were omputed: the quantizationerrorfor quantifying thequalityof
the lustering and the topographi error for quantifying the quality of the
mapping, [66℄. These quality riteria all depend on the dissimilarities used
to train the map but the results are made omparable by using normalized
dissimilarities.
α
-Mean 0.43111 0.28459 0.28429α
-Std 0.02912 0.01464 0.01523Metri OM HAM DHD Optimally-tuned
α
Quantizationerror 92.93672 121.67305 121.05520 114.84431
Topographierror 0.07390 0.08806 0.08124 0.05268
Table3: Preliminaryresultsforthree OMmetris(averageover50randomsubsamples):
Optimally-tuned
α
(toptable)andQualityriteriafortheSOMlustering(bottomtable).The results are listed in Table 3. Aording to the mean values of the
α
's,the threedissimilaritiesontributed toextratingtypologies. The Ham- ming andthedynamialHammingdissimilaritieshavesimilarweights,whilethe OM with ost-matrixdened fromthe transitionmatrix has the largest
weight. The mean quantization error omputed on the maps trained with
the three dissimilarities optimally ombined is larger than the quantization
error omputed on the map trained with the OM metrionly. On the other
hand, the topographierror is improved inthe mixed ase. In this ase, the
jointuse of the three dissimilaritiesprovides atrade-o between the quality
of the lustering and the quality of the mapping. The results onrm the
diulty todene adequateosts inoptimalmathing and the fatthat the
metrihastobehosenaordingtotheaimofthestudy: buildingtypologies
(lustering) or visualizingdata (mapping).
Finally, multiple rSOM was trained on the entire data set. The nal
map is illustrated in Figure 12. Several typologies emerge from the map: a
fast aess to permanent ontrats (lear blue), a transition through xed-
term ontrats before obtaining stable ones (dark and then lear blue), a
holding on prearious jobs (dark blue), a publi temporary ontrat (dark
green)oranon-all(pink)ontratendingattheend byastable one,along
period of inativity (yellow) or unemployment (red) with a gradual return
to employment. The mapping also shows a progressive transition between
trajetories of exlusion on the west and quik integration on the east. A
more detailedstudy of this data set is availablein [67℄.
5. Conlusion
An on-line version of relational SOM is introdued in this paper. It
ombines the standard advantage of the stohasti version of SOM (better
organization) with relational SOM, whih is able to handle data desribed
by dissimilarities. This approah is extended to the ase where several dis-
similarities are available for the initial data set. Online multiple relational
SOMhandlesseveraldissimilaritiesbyombiningtheminanoptimalfashion.
The algorithmshows good performanes, ompared toalternativemethods,
in projeting data desribed by numerialvariables,by ategorial variables
or by relations and is helpful to understand whih dissimilarity is the most
relevantwhen several ones are available. However, initsmultipledissimilar-
ity version, the main drawbak of the proposed relationalSOM algorithmis
related to the omputationtime: a sparse version should be investigated to
allow usto handlevery large data sets.
6. Aknowledgements
We thank the anonymous referees fortheir thorough omments and sug-
gestions and for valuable hints oninteresting referenes.
[1℄ N. Villa-Vialaneix, M. Olteanu, C. Ciero-Ayrolles, Carte auto-
organisatriepourgraphesétiquetés,in: AtesdesAteliersFGG(Fouille
de GrandsGraphes), olloqueEGC (ExtrationetGestion de Connais-
sanes), Toulouse,Frane,2013.
[2℄ M.Olteanu,N.Villa-Vialaneix,C. Ciero-Ayrolles, Multiplekernelself-
organizing maps, in: M. Verleysen (Ed.), XXIst European Symposium
onArtiial NeuralNetworks,ComputationalIntelligene and Mahine
Learning(ESANN),d-sidepubliations,Bruges, Belgium,2013,pp. 83
88.
[3℄ E. Pkalska, R. Duin, The Dissimilarity Representation for Pattern
Reognition. Foundations and Appliations, World Sienti, 2005.
[4℄ A. Abbott, A. Tsay, Sequene analysis and optimalmathing methods
in soiology, Review and Prospet. SoiologialMethods and Researh
29 (2000)333.
[5℄ L. Wu, Some omments on Sequene analysis and optimal mathing
methods in soiology, review and prospet, Soiologial Methods and
Researh 29 (2000)4164.
[6℄ M. Cottrell, M. Olteanu, F. Rossi, J. Rynkiewiz, N. Villa-Vialaneix,
Neural networks foromplexdata, KünstliheIntelligenz26(2012)18.
[7℄ M. Cottrell, P. Letrémy, How to use the Kohonen algorithmto simul-
taneously analyse individuals in a survey, Neuroomputing 63 (2005)
193207.
[8℄ T. Kohohen, P.Somervuo, Self-organizingmapsofsymbolstrings, Neu-
roomputing21 (1998)1930.
[9℄ B. Conan-Guez, F. Rossi, A. El Golli, Fast algorithm and implemen-
tation of dissimilarity self-organizing maps, Neural Networks 19(2006)
855863.
[10℄ D. MaDonald, C. Fyfe, The kernel selforganising map., in: Proeed-
ings of 4th International Conferene on knowledge-based intelligene
engineering systems and applied tehnologies, 2000, pp. 317320.
Systems 12 (2002)117135.
[12℄ N. Villa, F. Rossi, A omparison between dissimilarity SOM and
kernel SOM for lustering the verties of a graph, in: 6th In-
ternational Workshop on Self-Organizing Maps (WSOM), Neuroin-
formatis Group, Bieleeld University, Bieleeld, Germany, 2007.
doi:10.2390/bieoll-wsom2007-13 9.
[13℄ R. Boulet, B. Jouve, F. Rossi, N. Villa, Bath kernel SOM and re-
latedlaplaianmethodsforsoialnetworkanalysis, Neuroomputing71
(2008) 12571273.
[14℄ T. Gärtner,KernelforStruturedData,volume72ofSeriesin Mahine
Pereption and Artiial Intelligene,World Sienti, 2008.
[15℄ I. Shoenberg, Remarks to Maurie Fréhet's artile Sur la dénition
axiomatique d'une lasse d'espae distaniés vetoriellement appliable
sur l'espaede Hilbert, Annals of Mathematis 36(1935) 724732.
[16℄ G.Young,A.Householder, Disussionofasetofpointsintermsoftheir
mutualdistanes, Psyhometrika3 (1938) 1922.
[17℄ N. Krislok,H. Wolkowiz,Handbook onSemidenite,Coniand Poly-
nomial Optimization, volume 166 of International Series in Operations
Researh & Management Siene,Springer, 2012, pp. 879914.
[18℄ B. Hammer,A. Hasenfuss,F. Rossi, M.Strikert, Topographiproess-
ing of relational data, in: B. U. Neuroinformatis Group (Ed.), Pro-
eedings of the 6th Workshop on Self-Organizing Maps (WSOM 07),
Bielefeld, Germany,2007.
[19℄ F. Rossi, A. Hasenfuss, B. Hammer, Aelerating relational lustering
algorithmswithsparse prototype representation, in: Proeedings of the
6th Workshop on Self-OrganizingMaps (WSOM 07), Neuroinformatis
Group, Bieleeld University, Bieleeld,Germany, 2007.
[20℄ B. Hammer, A. Hasenfuss, Topographi mappingof large dissimilarity
data sets, Neural Computation22(2010) 22292284.
X. Zhu, Topographi mapping of dissimilarity data, in: J. Laaksonen,
T. Honkela (Eds.), Advanes in Self-Organizing Maps (Proeedings of
the 8thWorkshoponSelf-OrganizingMaps,WSOM 2011),volume6731
of Leture Notes in Computer Siene, Springer, Espoo, Finland, 2011,
pp. 115.
[22℄ L. Goldfarb, A uniedapproahtopatternreognition, PatternReog-
nition 17(1984) 575582.
[23℄ J. Fort, P. Letremy, M. Cottrell, Advantages and drawbaks of the
bath kohonen algorithm, in: M. Verleysen (Ed.), Proeedings of 10th
European Symposium on Artiial Neural Networks (ESANN 2002),
Bruges, Belgium,2002, pp. 223230.
[24℄ M. Olteanu, N. Villa-Vialaneix, M. Cottrell, On-line relational som
for dissimilaritydata, in: P.Estevez, J.Prinipe,P.Zegers, G.Barreto
(Eds.),AdvanesinSelf-OrganizingMaps(ProeedingsofWSOM2012),
volume 198 of AISC (Advanes in IntelligentSystems and Computing),
Springer Verlag, Berlin, Heidelberg, Santiago, Chile, 2012, pp. 1322.
doi:10.1007/978-3-642-35230-0_ 2.
[25℄ R Development Core Team, R: A Language and Environ-
ment for Statistial Computing, Vienna, Austria, 2012. URL:
http://www.R-projet.org,ISBN 3-900051-07-0.
[26℄ S.Theuÿl,A.Zeileis, CollaborativesoftwaredevelopmentusingR-Forge,
The R Journal 1 (2009)914. URL: http://journal.R-projet.org/.
[27℄ RStudio and In., shiny: Web Appliation Framework for R, 2013.
URL: http://CRAN.R-projet.or g/pa kag e=sh iny, R pakage ver-
sion 0.6.0.
[28℄ T. Heskes, Energy funtions for self-organizing maps, in: E. Oja,
S. Kaski (Eds.), Kohonen Maps, Elsevier, Amsterdam, 1999, pp. 303
315. URL: http://www.snn.ru.nl/reports /Hes kes. wsom .ps. gz.
[29℄ A. El Golli,F. Rossi, B. Conan-Guez, Y. Lehevallier, Une adaptation
des artesauto-organisatries pour des données déritespar un tableau
de dissimilarités, Revue de StatistiqueAppliquée LIV (2006)3364.