HAL Id: inria-00000289
https://hal.inria.fr/inria-00000289
Submitted on 22 Sep 2005
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Application of the 2-3 Agglomerative Hierarchical Classification on Web Usage Data.
Sergiu Chelcea, Brigitte Trousse
To cite this version:
Sergiu Chelcea, Brigitte Trousse. Application of the 2-3 Agglomerative Hierarchical Classification on
Web Usage Data.. SYNASC 2004, 6th International Workshop on Symbolic and Numeric Algorithms
for Scientific Computing„ Sep 2004, Timisoara, Romania. �inria-00000289�
Hierarhial Classiation on Web usage data
SergiuCheleaandBrigitteTrousse
AxISResearhTeam,INRIA 1
Sophia-Antipolis,Frane
BP93,06902Sophia{AntipolisCedex
FirstName.LastNameinria.fr
Abstrat. Inthis paperwepresent alustering ofINRIA 1
Websites'
visitedtopis,usinganhierarhiallassiationmethod,the2-3Hierar-
hialAsendingClassiationproposedbyPatrieBertrandin[1℄.The
obtainedlusters arethenanalyzedfor theirrelevane.
Keywordsandphrases:Clustering,Webusagedata,Classiation, Data
Mining,2-3Hierarhies
1 Introdution
Nowadaysthe rapid and ontinuous growthof theWeb an be a serious
obstaleforusersintheirinformationsearh,despitetheimprovementofexisting
searh engines. Personalizing the web spae by analyzing users behaviouran
redue their quest for information and help them nd more easily what they
are looking for. For this, Data Mining tehniques an be used for Web data
analysis,in oneofthethree main WebMining domains:Web ContentMining,
WebStrutureMiningandWebUsageMining(WUM).
TheWebusagedataused duringtheWUMproess aregenerallytheusers'
navigationalpathsgathered in Web serverlogs, sometimes orrelated within-
formations from the other Web Mining proesses, e.g.the site struture. The
usersbehaviouranalysishelpsin theWebsite(s)re-oneptionproessandalso
failitatesusersinformationsearh(throughdynamiinsertedlinks).
Inthis paper we present alustering of theINRIA Web sites'visited top-
is, using anhierarhiallassiationmethod, the 2-3HierarhialAsending
Classiation proposed by PatrieBertrand in [1℄. The 2-3Agglomerative Hi-
erarhialClustering(2-3AHC)[1℄,generalizestheAgglomerativeHierarhial
Clustering(AHC)by givingeah lusterthe possibilityof intersetingat most
anotherluster,whentheobtainedintersetionisdistintfromthetwolusters.
This harateristi allows the resulting lusters struture to highlight groups
of objets(lusters)that havethe ommonharateristisof twoother groups
(whih isnotpossiblewith thelassialAHC).
Usingournew2-3AHCalgorithm [3℄withthesame(n 2
logn)algorithmi
omplexityasthelassialAHC,wehaveanalyzedINRIA's Websites'visitors
ativitiesbasedontheirbehaviour.
1
Intherst subsetionwepresentsomehierarhiallassiationnotionsalong
withthelassialAHCalgorithm.Theseondsubsetionpresentsthe2-3AHC
onept introdued in [1℄followedby ashortdesription ofour2-3AHCalgo-
rithm[3℄,usedlateralongwiththelassialAHCalgorithmtolassifytheWeb
usagedata(Setion3.2).
2.1 Agglomerative Hierarhial Classiation
ClusteringortheunsupervisedlassiationisaDataMiningtehniqueusedto
grouptogethersimilarobjetintolassesalsoknownaslusters.Amongthewell
knownlusteringtehniquesoneannd:theneuralnetworks, thehierarhial
lassiationmethods,thefuzzynetworks,thedeisiontrees,et.Eahofthese
lustering tehniquesgeneratesa luster set organizedby itsspei struture
(partitions,hierarhies,pyramids,et.).
Intheagglomerative hierarhial methods,startingform theinitialelements
(the singletons), thelusters are suessively mergedintohigher level lusters,
until the entire set of analyzed objets beomes a luster. These resultinghi-
erarhialstrutures (hierarhies,2-3hierarhies,pyramids)anthen be easily
visualizedusingagraphialled dendogram.
Inordertopresentthe2-3AgglomerativeHierarhialClassiation[1℄and
our2-3AHC algorithm[3℄,werstremindthelassialAHC priniple.
SokalandSneathproposedin1963[6℄therstversionofthelassialAHC
algorithm, onsistingin twophases:initializationandmerging.
Duringtherstphase,thesingletonsdistane (dissimilarity)matrixisom-
puted using a distane (dissimilarity) measure (e.g. the Eulidean distane),
while the singletons represent the initial set of lusters. In the seond phase,
suessive mergings are performed between the two losest lusters, until the
initialobjetsareallmergedintoanal luster.Thetwolustersarelosestin
thesenseofahosenaggregationlink,denotedandsimplyalledlink(i.e.sin-
gle link,omplete link,averagelink). Togetherwiththis riteria,others likethe
ardinality and thelexiographial order anbe used to determine thelosest
lusters.
The link (X;Y) between two merged lusters X and Y, represents the
heterogeneitydegreeof theresulting luster X[Y,and is denoted f(X[Y).
Inthe rstphaseof the AHCalgorithm,the f valuesof thesingletons are set
to 0. At the end of the seond phase, the resultingstruture is a sequene of
nestedpartitionswhihanbevisualizedusing adendogram(graphibasedon
theheterogeneitydegreef ofthelusters).
A small example of alassial hierarhyis presentedin Figure 1. In order
to hose a partition from the resulting hierarhy, the dierenes between the
f level of the reated lusters are analyzed. Usually the partitioning level is
hosen wheneverthere's a\big" dierenebetween thereatedluster's levels.
1
generatedbyP
2
,sinethelustersaremorehomogeneous.
X
Y X ∪ Y
Fig.1.Classialhierarhy
P 1
P 2
b d e f g
a c
Fig.2.Hierarhypartitioning
Whenever a new luster X [Y is reated by merging two other lusters
X andY, wesay that X[Y is predeessor for X and Y, while X and Y are
suessorsofX[Y (f.Figure1).Thelustersfoundonthesamef levelastheir
predeessorarenon-relevantlustersfromtheviewpointoflusteranalysis,and
an be eliminated from the hierarhy after its reation. This nal elimination
stepis alledthe renement stepand willprodue astritlyindexedhierarhy,
makingtheorrespondingdendogrameasiertovisualize.
The main harateristi of the lassial AHC algorithm is that after eah
merging,onlytheresultinglusteriskeptforfuturemergings.Thustheresult-
ing struture (the hierarhy) will ontain only nested ordisjointlusters, also
denoted ashierarhiallusters.
Ahierarhywillindueanewdistanematrix(ultra-metri)overtheinitial
elementsbasedonthedistanesatwhihtheywererstregroupedinaluster.
Thismatrixanbethenomparedwiththeinitialmatrixorwithothermethods
indued matries, for quality analysis [3℄ using dierent indies (e.g Stress [5℄
formula).
2.2 2-3Hierarhies and 2-3AHC Algorithm
We present here the 2-3 hierarhy onept along with the 2-3 Agglomerative
Hierarhial Classiation method introdued in [1℄ in order to generalize and
to makemoreexiblethelassialAHC.
As we sawbefore, the AHC generates disjoint lusters orlusters inluded
oneintheother.The2-3AgglomerativeHierarhial Classiationmethodpro-
posed in [1℄ gives eah luster the possibility of interseting at most another
luster, when the obtained intersetion is distint from the two lusters. This
harateristi allows the obtained luster struture to highlight groupsof ob-
jetshavingtheommonharateristisof twoothergroups(notpossiblewith
theAHC).
Theresulting luster struture is alled an 2-3 hierarhy, term justied by
formed lusters pairs are hierarhial (nested or disjoint).
Theterm2-3hierarhyspeieshowthesetof2-3hierarhiesisanextension
of thehierarhiesset -indeed, from the denition aboveit learlyresults that
a hierarhyis a partiular ase of 2-3 hierarhy. This happens when all three
possibleluster pairsarehierarhial,thusleadingtoalassialhierarhy.
A2-3hierarhyexampleispresentedinFigure3bellow.Whentwolusters
arenothierarhial,wesaythattheyproperly intersetthemselves:X properly
intersetsY ,X\Y 2=fX;Y;;g. Forinstaneontheexamplefrom Figure3,
theluster fbagproperly intersetsfag,while thelustersfbagand fbagare
hierarhial.
a c
b d
0 1 2 3
e
Fig.3.Exampleofan2-3Hierarhy
Sinethe2-3hierarhiesallowlusterstoproperlyintersetthemselves,their
strutures are riherompared with thelassial hierarhies obtainedonsame
datasets[3℄.Forexamplethemaximalnumberofreatedlustersbythelassial
AHC is(n 1), ompared with [ 3
2
(n 1)℄ forthe 2-3AHC [1℄,where n isthe
initial number of elements. Comparative tests [3℄ have showed an 11% (with
single-link) to45%(withomplete-link)inreaseinthereatedlustersnumber
forour2-3AHCalgorithm.Figure4showsasmallexampleofreatedhierarhy
and2-3hierarhyonathree pointsdataset,usingthesingle-link.
b c a 1 2 3
b b a c b a c
1 2 0 3
Fig.4.AHCand2-3AHC
Duringourexperimentations onthe Webusage data presented in thenext
setion, wehaveused our simplied2-3AHC algorithmproposed in [2℄.Com-
paredwith theinitial 2-3AHC algorithm [1℄,our2-3AHCalgorithm's advan-
tages has a smaller omplexity (O(n 2
logn) instead of the initial O(n 3
)), and
3.1 Motivations
NowadaystherapidandontinuousgrowthoftheWebanbeaseriousobstale
forusersintheirinformationsearh,despitetheimprovementsofexistingsearh
engines. Personalizingthe web spae by analyzing users behaviour anredue
theirquestforinformationandhelpthemndmoreeasilywhattheyarelooking
for.Forthis,DataMiningtehniquesanbeusedforWebdataanalysis,inone
of the three main Web Mining domains: Web Content Mining, Web Struture
MiningandWebUsageMining (WUM).
TheWebusagedataused duringtheWUMproess aregenerallytheusers'
navigationalpathsgatheredinWebserverlogs,sometimesorrelatedwithinfor-
mationfromtheother WebMiningproesses,e.g.thesitestruture.Theusers
behaviouranalysishelpsintheWebsite(s)re-oneptionproessandalsofail-
itatesusersinformationsearh(throughdynamiinsertedlinks).
UserssearhingforinformationonINRIA'sWebsite,aretransparentlybrows-
ingthroughtheinteronnetedpagesofINRIA'sWebservers.Totraetheirbe-
haviour,wehaveanalyzedthelogaesslesfromtwoINRIAWebservers:the
nationalWebserver(http://www.inria.fr)andalsotheSophiaAntipolisresearh
unit Webserver(http://www-sop.inria.fr).
SineINRIA'ssientiorganizationintoresearhteamshasreentlyhanged
(1 st
April2004),wehavehosento studytheWeb usersbehaviourontwodif-
ferent15daystimeperiods:
-from01until15January2003,perioddenotedinthefollowingsasPer
1 ,
-andfrom27Mayuntil10June2004,perioddenoted asPer
2 .
Indeed, the main motivation of our study here wasto analyze the impat
of thehangesin theWebsitestruture (seeAppendix A),on usersbehaviour
whensearhingforinformation.Morepartiularly,ourstudyonernedthelus-
tering ofINRIAWeb sites'visited topis,using the2-3HierarhialAsending
Classiationpresentedintheprevioussetionandwasdoneintwophases:
-rst,thepreproessingof theWebaesslogs(based ontheworkof
Tanasaetal.[7℄)andpresentedinsetion3.2,
-seondly,thedatamining(usingour2-3AHCalgorithm)andtheresult
analysisphase(setion4).
3.2 Data preproessing
Inthis subsetion,wewillshortlyexplain thedata preproessingmethodology
proposedbyTanasaetal.(see.[7℄formoredetails).WeusedtheAxISLogMiner 1
tooldevelopedwithintheAxISresearhteamatINRIASophia Antipolis.
Theaimofthepreproessingphasewastoidentifyandextratuser naviga-
tions(sequenes ofuserations) fromthe rawWeb logs,and wasdone in four
steps:datafusion,data leaning, datastruturationanddatasummarization.
1
Duringthedata fusionstep,theWeblogsleswerejoined togetherforeah
analyzedperiod(resulting in logL
1
for Per
1
andlog L
2
forPer
2
),in order to
reonstruttheross-serverusers'navigations.
Thus thetwojoined logsontainedalltherequests (hronologiallysorted)
made by dierent users for dierent resoures on the two Web servers, over
the given periods of time. Some of these requests were made for non-relevant
resouresfromouranalysisviewpointandwereeliminatedin thedataleaning
step. For example, we do not interest ourselves in requests for images, sine
they usually are impliit requests(images ontainedin the aessed page [7℄).
Also,therequestsmadebywebrawlers(robots)havebeeneliminatedfromthe
weblogsforobviousreasons.Filteringoutallthese requestshasreduedL
1 to
11%and L
2
to 15%,from their original size. Forexample for L
2
, thenumber
of requestswasreduedfrom 4.473.228 to686.084,equivalentto alog lesize
redutionfrom901Mbto 135Mb.
Next,thedatastruturationstepgroupedtheunstruturedloglesrequests
byuser, user session,page view, and visit(navigations). Sinelog lesontain
onlytheomputerIPand(sometimes)theuseragent,we'veonsideredasauser,
theouple(IP,[UserAgent℄)andasauser sessionalltheationsperformedby
theuserovertheanalyzedperiod.
Thentheusersnavigationswereobtainedbysplittingeveryusersessionusing
a30minutesthreshold: 173.015navigationsforL
1
and 145.454navigationsfor
L
2 .
Finally,theobtainedlogleswerestoredinarelationaldatabaseinthedata
summarizationstep.
Advaned datapreproessing :
We performed ageneral data seletion step in whih we seleted from the
relationalDBthenavigations(usersvisits)toanalyzelater,usingthefollowing
riteria:
-navigationduration>60seonds,
-numberofrequestsinthenavigation>10,
-browsingspeed(duration/numberofrequests)>4.
Thishas reduedthenumberof analyzednavigationto 9625for L
1 and to
9309forL
2 .
Next, depending on the analysis, we performed seondary data seletions.
Forexample,in the seondarydata seletionassoiatedwith ourrst analysis
(setion 4) we deided to keep onlythe visits onbothINRIA's serversand to
lusterthevisitedrstleveltopis(fromthevisitedURLs).Thusthenumberof
analyzednavigationswasreduedto 3905forL
1
andto 3513forL
2
. Also,the
Web pagesreturnedby theWebserverwithanerrorstatusode( 400)were
ignoredinouranalysis.Wehavefoundatotalof190visitedtopisforPer
1 (78
wereresearhteams).ForPer
2
wefound210topisfromwhih86wereresearh
teams(49atualresearhteamsand37oldresearhteamsfromPer
1 ).
were assigned to dierent researh teams for later lustering. Sine INRIA re-
searhteamsorganizationhashanged(startingfrom1 st
ofApril2004),itsWeb
site struture hanged aordingly. The researh teams were reorganized from
thefourexistingresearhthemes,intovenewresearhthemes(AppendixA).
Wedeided to analyze theimpat of the Web site struture on users nav-
igations before and after this hange (during Per
1
and Per
2
). This wasdone
by performing twodierentanalyses of usersvisits: one on INRIA's rst level
topis andanotheronINRIA's researhteams.Fortheseondanalysis,sinea
uservisitisatuallyasetofvisitedURLs,weneededtodeterminewhihURLs
belongtodierentresearhteams.
EahURLanhaveseveraltopisassoiatedwithdierentsemantitopis:
http:==www sop:inria:fr
| {z }
=axis
|{z}
=personnel
| {z }
=Doru:Tanasa
| {z }
=doru eng:html
Site topi1 topi2 topi3
Thus in a rst step, a URL wasassigned to aresearh team when one of its
topis was the researh team itself. After this, omplementary information on
INRIA's Web site was used to assign URLs to researh teams. For example,
theURL:http://www.inria.fr/reherhe/equipes/axis.en.htmldoesnot
ontain any researh team topis, but is the AxIS researh team presentation
pagefromINRIA's mainserver.
Afterthedatageneralizationstepandinordertolustertheobtainedtopis,
weneededto ompute the dissimilarity matrix usedas inputfor theAHC and
our2-3AHC algorithms. Forthis, weused theJaardsimilarityindex onthe
visited topis as dened in [4℄. As in [4℄, werepresented eah navigation by a
binaryvetorofthevisitedtopis: thepositioni inthevetoris0iftopi
i was
not visited and 1 if topi
i
was visited during the navigation. Based on these
vetors and aiming to dene a similarity/dissimilarity between two topis R
i
andR
j
,wedenethefourfollowingquantities:
-aasthenumberofountswhenR k
i
=R k
j
=1,
-basthenumberofountswhenR k
i
=0andR k
j
=1,
-asthenumberofountswhenR k
i
=1andR k
j
=0,
-dasthenumberofountswhenR k
i
=0andR k
j
=0.
ThenthesimilaritybetweentwotopisR
i andR
j
isomputedusing:
S(R
i
;R
j ) =
a
a+b+
, whih represents the probability of visiting both topis
when at least one of them is visited. The dissimilarity matrix used as input
for thelassialAHCand our2-3AHC, wasomputedusing thedissimilarity:
(R
i
;R
j
)=1 S(R
i
;R
j ).
4 Results
Forourrstanalysis,wehavefousedontheresearhteamsdistributioninthe
server-rossedvisitedtopis.WehaveseletedfromPer
1
allserver-rossednavi-
epidaureSOP3B,o dysseeSOP3B,iare4A,miaouSOP4A, orion3A
epidaure3B,arianaSOP3B, revesSOP3B,miaou4A,
ariana3B hirSOP4A,omore4A,
aimanSOP4B
prismeSOP2B,prisme2B koalaSOP2A,koala2A, o dyssee3B,dreamSOP3A,
roapSOP2A,roap2A lemme2A,opaleSOP4B,
opale4B,ertilab2A,
pastis3B
orionSOP3A,aaiaSOP3A, oprinSOP2B,sagaSOP2B, sinusSOP4B,sinus4B,
aaia3A,axisSOP3A, saga2B smashSOP4B
orion3A,aidSOP3A,
aid3A
rob otvisSOP3B,rob otvis3B, mimosaSOP1C,mimosa1C, slo opSOP1A,slo op1A,
o dysseeSOP3B tikSOP1C,tik1C oasisSOP2A,oasis2A
ro deoSOP1B,ro deo1B, lemmeSOP2A,tropisSOP1A, mistralSOP1B,mistral1B
planeteSOP1B,planete1B masotteSOP1B,omegaSOP4B,
galaadSOP2B,afeSOP2B,
ertilabSOP2A
mefistoSOP4B,mefisto4B masotteSOP1B,masotte1B safirSOP2B,safir2B
meijeSOP1C,meije1C
Table1. INRIA'sWebsitetopislusteringusing2-3AHCforPer1
gations(visitingbothWebsites:main andSophia's),andthenlusteredallob-
tainedvisitedtopis using our2-3AHCalgorithm.Table1presentstherepar-
titionofonlytheresearhteamtopisintheobtainedlusters(theothertopis
arenotpresentedhere).Also,wedidnotrepresent,theoneelementlusters(the
\outliers"):aiman 4B, saga 2B, meije SOP 1C, sysdys SOP 4B,
hir 4A, afe 2B, odes 2B, visa SOP 4A, tropis 1A, omega 4B.
Weaddedafterthenameofeahresearhteam,theirthemeandsub-theme, as
wellastheirsite(emptyforthemainsite and\SOP"forSophia'ssite).
As we ansee the researhteams distribution usually orresponds to their
thememembership(16outofthe19non-triviallustersontainresearhteams
fromthesametheme).Also,oldresearhteamsthathavebeenreplaedbynew
researh teams, are in the same lusters as the orresponding new ones,sine
their pagesare stronglyinteronneted. Forexample: aidwasreplaedby axis
(luster7), rodeobyplanete(luster 13),et.
Fig.5.2-3Hierarhyontheme3projetsduringPer1
setion3.2,onlythose visitingatleastonepageonINRIA's mainserver.From
thesenavigations,wehooseto analyzethevisitedtopis(researhteams)only
from the main server pages and to luster only the researh teams topis for
theme3(Per
1
andPer
2
)andforthemeCog(Per
2 ).
Thus, we haverst seleted only those navigationsontaining at least one
visit of the theme 3 pages on INRIA's main server during Per
1
. From these
navigations,wehave lustered thevisited topis on the main serveronly, or-
respondingto researhteamsfrom theme3(Figure5). Intheresultinglassi-
ation we andistinguishtwomain lusters (82 and85) that groupalmost all
elementsfrom thetwosub-themesoftheme3(seeAppendix A).
Fig.6. 2-3 hierarhy ontheme 3 projets
duringPer2
Fig.7.2-3hierarhyonthemeCogprojets
duringPer2
Next,wehaveseletedfortheseondperiodthenavigationsvisitingatleast
oneofthenewthemespagesontheINRIA'smainserver.Fromthesenavigations
wehavefoused againon thetopis orresponding to researhteams from the
\old" theme3,only onthemain server'spages (f.Figure 6). Werefer in this
lassiationtothenewthemeBio,whihgroupstogetherinluster73thefour
theywereeitherreplaedorstopped,buttheirWebpagesarestillaessibleon
theInternet(i.e.sharp, opera,verso).
Finally,wehaveseletedonlythenavigationsthathadatleastonevisitofthe
newthemeCogpagesduringPer
2
.Wehavethenlusteredonlythetopisonthe
mainserverspagesorrespondingtoaresearhteamfromthemeCog(f.Figure
7).WenotethatusershavethetendenytovisitallCogAresearhteams,while
for the others sub-themes there is a ertain variability and a deeper analysis
is needed (possible auses: events like onferenes or seminaries presented on
INRIA's Websitethataets usersvisits,thereenthangeofstruture, et.).
Alsowefoundthatourresearhteam,AxIS,isgroupedinthetwotimeperiods
withdierentresearhteams.
Fig.8. Classial hierarhy on theme
Cogprojets Fig.9.2-3hierarhyonthemeCogprojets
Inour nal analysis we haveompared the lassialAHC method and our
2-3AHC algorithm, bylustering the researhteams from themeCog (during
Per
2
).Thedataseletionwasthesameasin thepreviousanalysis:navigations
visitingatleastoneofthethemeCogpages,topisonlyfromthemainserver's
pagesandrepresentingresearhteams fromthemeCog.
Figures8and9presentapartialoutput (ontainingall3B researhteams)
of thelassialAHC respetivelyofour2-3AHC algorithm.The 2-3hierarhy
obtainedontainsmorereatedlustersthanthelassialhierarhy(22against
15),and thus moreinformation.Forexample,analyzingluster 54in Figure9,
wean saythat researhteamsariana, epidaureandodysseehavea\stronger"
probabilityofbeingvisitedtogether,omparedwiththeonegivenbythelassial
Thispaperpresentstherstappliationofa2-3AHCalgorithmonWebusage
data,andshowsthepotentialsofouralgorithmomparedwiththelassialAHC
algorithm. Inourstudy, weinterested ourselvesin lustering thevisited topis
ofINRIA'sWebsite,whihreently(1 st
April2004)hangeditsstruture.
WehavestudiedtheimpatofINRIA'sWebsite strutureonusersnaviga-
tions,duringtwotimeperiods(beforeandrightafterthesitestruturehange).
Althoughtheseondanalyzedperiodwasshortlyafterthehange,wehavefound
that usuallyusersnavigationsareinuenedbytheWebsite struture.
Ourongoingandfutureworkonernthefollowingtopis:
-deeperanalysisoftheomparisonbetween2-3AHCandAHConsame
Webusagedata,usingdierentaggregationlinks,
-abetterdissimilaritymeasure.ForexamplethegeneralizedJaardin-
dex,whihtakesintoaountthenumberofvisitedpagesforatopi,
notjustitspresene(ountvs.binary),
-appliationofour2-3AHCalgorithmonotherdatainferred fromthe
ativitiesreportsofINRIA'sresearhteams,andomparisonofthe
resultswiththeonesobtainedhere.
Aknowledgements:WewouldliketothankMihaiJuraforhishelponthe
preproessingphaseand Sophie Honnoratfor herhelp withtheenglish orre-
tions.
A INRIA researh teams organization
Before1 st
ofApril2004,INRIA'sresearhteamswereorganizedinfourdierent
researhthemes,namely:
-Theme1:Networksandsystems:
-A: ArhiteturesandSystems,
-B :NetworksandTeleommuniations,
-C :DistributedandReal-TimeProgramming.
-Theme2:Softwareengineeringandsymboliomputing:
-A: SemantisandProgramming,
-B :Algorithmsand ComputationalAlgebra.
-Theme3:Human-omputerinteration,imagesproessing,datamanage-
ment,knowledgesystems:
-A: Databases,KnowledgeBases,CognitiveSystems,
-B :Vision,ImageAnalysisandSynthesis.
-Theme4:Simulationandoptimizationofomplexsystems:
-A: Control,Robotis,Signal,
-B :ModellingandSientiComputing.
Afterthisdate,theresearhteamswerereorganizedinthefollowingveresearh
-A:Distributedsystemsand softwarearhiteture,
-B :Networksandteleoms,
-C :Embeddedsystemsand mobility,
-D :Arhitetureandompiling.
-ThemeCog:Cognitivesystems:
-A: Statistialmodelingandmahinelearning,
-B :Pereption,indexing andommuniationforimagesandvideo,
-C :Multimediadata:interpretationandman-mahineinteration,
-D :Imagesynthesisandvirtualreality.
-ThemeSym:Symbolisystems:
-A: Reliabilityandsafetyofsoftware,
-B :Algebraiandgeometristrutures,algorithms,
-C :Managementandproessingoflanguageanddata.
-ThemeNum:Numerialsystems:
-A: Controlandomplexsystems,
-B :Gridsandhigh-performaneomputing,
-C:Optimizationandinverseproblemsforstohastiorlarge-salesystems,
-D :Modeling,simulationandnumerialanalysis.
-ThemeBio:Biologialsystems:
-A: Modelingandsimulationinbiologyandmediine.
Referenes
1. P.Bertrand. Setsystemsfor whiheahsetproperlyintersets atmostoneother
set-Appliation toClusterAnalysis. ResearhReportCeremade0202,Universite
Paris-9,Frane,2002.
2. S.Chelea,P.Bertrand,andB.Trousse.Agglomerative2-3HierarhialClustering:
theoretialimprovementsandtests. In27thAnnualConferene oftheGesellshaft
fur Klassikation,Cottbus,Germany,12-14mars2003.
3. S.Chelea, P. Bertrand,andB.Trousse. UnNouvelAlgorithme deClassiation
Asendante2-3Hierarhique.InReonnaissane desFormesetd'IntelligeneArti-
ielle (RFIA 2004), Centrede Congres PierreBAUDIS,Toulouse,28-30 Janvier
2004.
4. A.ElGolli,B.Conan-Guez,F.Rossi,D.Tanasa,B.Trousse,andY.Lehevallier.Les
artes topologiquesauto-organisatries pourl'analyse des hiers logs. In11emes
Renontre de laSoiete Franophone de Classiation, Bordeaux,8-10 septembre,
2004. toappear.
5. A.R. JOHNSONand D.W. WICHERN. Applied Multivariate Statistial Analysis,
hapter12. PrentineHall,1982.
6. R.R. Sokaland P.H.A. Sneath. Priniples of numerial taxonomy. Freeman, San
Franiso,1963.
7. D. Tanasaand B. Trousse. Advaneddata preproessing for intersites webusage
mining. IEEEIntelligentSystems, 19(2):59{65,Marh-April2004.