• Aucun résultat trouvé

Application of the 2-3 Agglomerative Hierarchical Classification on Web Usage Data.

N/A
N/A
Protected

Academic year: 2021

Partager "Application of the 2-3 Agglomerative Hierarchical Classification on Web Usage Data."

Copied!
13
0
0

Texte intégral

(1)

HAL Id: inria-00000289

https://hal.inria.fr/inria-00000289

Submitted on 22 Sep 2005

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Application of the 2-3 Agglomerative Hierarchical Classification on Web Usage Data.

Sergiu Chelcea, Brigitte Trousse

To cite this version:

Sergiu Chelcea, Brigitte Trousse. Application of the 2-3 Agglomerative Hierarchical Classification on

Web Usage Data.. SYNASC 2004, 6th International Workshop on Symbolic and Numeric Algorithms

for Scientific Computing„ Sep 2004, Timisoara, Romania. �inria-00000289�

(2)

Hierarhial Classiation on Web usage data

SergiuCheleaandBrigitteTrousse

AxISResearhTeam,INRIA 1

Sophia-Antipolis,Frane

BP93,06902Sophia{AntipolisCedex

FirstName.LastNameinria.fr

Abstrat. Inthis paperwepresent alustering ofINRIA 1

Websites'

visitedtopis,usinganhierarhiallassiationmethod,the2-3Hierar-

hialAsendingClassiationproposedbyPatrieBertrandin[1℄.The

obtainedlusters arethenanalyzedfor theirrelevane.

Keywordsandphrases:Clustering,Webusagedata,Classiation, Data

Mining,2-3Hierarhies

1 Introdution

Nowadaysthe rapid and ontinuous growthof theWeb an be a serious

obstaleforusersintheirinformationsearh,despitetheimprovementofexisting

searh engines. Personalizing the web spae by analyzing users behaviouran

redue their quest for information and help them nd more easily what they

are looking for. For this, Data Mining tehniques an be used for Web data

analysis,in oneofthethree main WebMining domains:Web ContentMining,

WebStrutureMiningandWebUsageMining(WUM).

TheWebusagedataused duringtheWUMproess aregenerallytheusers'

navigationalpathsgathered in Web serverlogs, sometimes orrelated within-

formations from the other Web Mining proesses, e.g.the site struture. The

usersbehaviouranalysishelpsin theWebsite(s)re-oneptionproessandalso

failitatesusersinformationsearh(throughdynamiinsertedlinks).

Inthis paper we present alustering of theINRIA Web sites'visited top-

is, using anhierarhiallassiationmethod, the 2-3HierarhialAsending

Classiation proposed by PatrieBertrand in [1℄. The 2-3Agglomerative Hi-

erarhialClustering(2-3AHC)[1℄,generalizestheAgglomerativeHierarhial

Clustering(AHC)by givingeah lusterthe possibilityof intersetingat most

anotherluster,whentheobtainedintersetionisdistintfromthetwolusters.

This harateristi allows the resulting lusters struture to highlight groups

of objets(lusters)that havethe ommonharateristisof twoother groups

(whih isnotpossiblewith thelassialAHC).

Usingournew2-3AHCalgorithm [3℄withthesame(n 2

logn)algorithmi

omplexityasthelassialAHC,wehaveanalyzedINRIA's Websites'visitors

ativitiesbasedontheirbehaviour.

1

(3)

Intherst subsetionwepresentsomehierarhiallassiationnotionsalong

withthelassialAHCalgorithm.Theseondsubsetionpresentsthe2-3AHC

onept introdued in [1℄followedby ashortdesription ofour2-3AHCalgo-

rithm[3℄,usedlateralongwiththelassialAHCalgorithmtolassifytheWeb

usagedata(Setion3.2).

2.1 Agglomerative Hierarhial Classiation

ClusteringortheunsupervisedlassiationisaDataMiningtehniqueusedto

grouptogethersimilarobjetintolassesalsoknownaslusters.Amongthewell

knownlusteringtehniquesoneannd:theneuralnetworks, thehierarhial

lassiationmethods,thefuzzynetworks,thedeisiontrees,et.Eahofthese

lustering tehniquesgeneratesa luster set organizedby itsspei struture

(partitions,hierarhies,pyramids,et.).

Intheagglomerative hierarhial methods,startingform theinitialelements

(the singletons), thelusters are suessively mergedintohigher level lusters,

until the entire set of analyzed objets beomes a luster. These resultinghi-

erarhialstrutures (hierarhies,2-3hierarhies,pyramids)anthen be easily

visualizedusingagraphialled dendogram.

Inordertopresentthe2-3AgglomerativeHierarhialClassiation[1℄and

our2-3AHC algorithm[3℄,werstremindthelassialAHC priniple.

SokalandSneathproposedin1963[6℄therstversionofthelassialAHC

algorithm, onsistingin twophases:initializationandmerging.

Duringtherstphase,thesingletonsdistane (dissimilarity)matrixisom-

puted using a distane (dissimilarity) measure (e.g. the Eulidean distane),

while the singletons represent the initial set of lusters. In the seond phase,

suessive mergings are performed between the two losest lusters, until the

initialobjetsareallmergedintoanal luster.Thetwolustersarelosestin

thesenseofahosenaggregationlink,denotedandsimplyalledlink(i.e.sin-

gle link,omplete link,averagelink). Togetherwiththis riteria,others likethe

ardinality and thelexiographial order anbe used to determine thelosest

lusters.

The link (X;Y) between two merged lusters X and Y, represents the

heterogeneitydegreeof theresulting luster X[Y,and is denoted f(X[Y).

Inthe rstphaseof the AHCalgorithm,the f valuesof thesingletons are set

to 0. At the end of the seond phase, the resultingstruture is a sequene of

nestedpartitionswhihanbevisualizedusing adendogram(graphibasedon

theheterogeneitydegreef ofthelusters).

A small example of alassial hierarhyis presentedin Figure 1. In order

to hose a partition from the resulting hierarhy, the dierenes between the

f level of the reated lusters are analyzed. Usually the partitioning level is

hosen wheneverthere's a\big" dierenebetween thereatedluster's levels.

(4)

1

generatedbyP

2

,sinethelustersaremorehomogeneous.

X

Y X ∪ Y

Fig.1.Classialhierarhy

P 1

P 2

b d e f g

a c

Fig.2.Hierarhypartitioning

Whenever a new luster X [Y is reated by merging two other lusters

X andY, wesay that X[Y is predeessor for X and Y, while X and Y are

suessorsofX[Y (f.Figure1).Thelustersfoundonthesamef levelastheir

predeessorarenon-relevantlustersfromtheviewpointoflusteranalysis,and

an be eliminated from the hierarhy after its reation. This nal elimination

stepis alledthe renement stepand willprodue astritlyindexedhierarhy,

makingtheorrespondingdendogrameasiertovisualize.

The main harateristi of the lassial AHC algorithm is that after eah

merging,onlytheresultinglusteriskeptforfuturemergings.Thustheresult-

ing struture (the hierarhy) will ontain only nested ordisjointlusters, also

denoted ashierarhiallusters.

Ahierarhywillindueanewdistanematrix(ultra-metri)overtheinitial

elementsbasedonthedistanesatwhihtheywererstregroupedinaluster.

Thismatrixanbethenomparedwiththeinitialmatrixorwithothermethods

indued matries, for quality analysis [3℄ using dierent indies (e.g Stress [5℄

formula).

2.2 2-3Hierarhies and 2-3AHC Algorithm

We present here the 2-3 hierarhy onept along with the 2-3 Agglomerative

Hierarhial Classiation method introdued in [1℄ in order to generalize and

to makemoreexiblethelassialAHC.

As we sawbefore, the AHC generates disjoint lusters orlusters inluded

oneintheother.The2-3AgglomerativeHierarhial Classiationmethodpro-

posed in [1℄ gives eah luster the possibility of interseting at most another

luster, when the obtained intersetion is distint from the two lusters. This

harateristi allows the obtained luster struture to highlight groupsof ob-

jetshavingtheommonharateristisof twoothergroups(notpossiblewith

theAHC).

Theresulting luster struture is alled an 2-3 hierarhy, term justied by

(5)

formed lusters pairs are hierarhial (nested or disjoint).

Theterm2-3hierarhyspeieshowthesetof2-3hierarhiesisanextension

of thehierarhiesset -indeed, from the denition aboveit learlyresults that

a hierarhyis a partiular ase of 2-3 hierarhy. This happens when all three

possibleluster pairsarehierarhial,thusleadingtoalassialhierarhy.

A2-3hierarhyexampleispresentedinFigure3bellow.Whentwolusters

arenothierarhial,wesaythattheyproperly intersetthemselves:X properly

intersetsY ,X\Y 2=fX;Y;;g. Forinstaneontheexamplefrom Figure3,

theluster fbagproperly intersetsfag,while thelustersfbagand fbagare

hierarhial.

a c

b d

0 1 2 3

e

Fig.3.Exampleofan2-3Hierarhy

Sinethe2-3hierarhiesallowlusterstoproperlyintersetthemselves,their

strutures are riherompared with thelassial hierarhies obtainedonsame

datasets[3℄.Forexamplethemaximalnumberofreatedlustersbythelassial

AHC is(n 1), ompared with [ 3

2

(n 1)℄ forthe 2-3AHC [1℄,where n isthe

initial number of elements. Comparative tests [3℄ have showed an 11% (with

single-link) to45%(withomplete-link)inreaseinthereatedlustersnumber

forour2-3AHCalgorithm.Figure4showsasmallexampleofreatedhierarhy

and2-3hierarhyonathree pointsdataset,usingthesingle-link.

b c a 1 2 3

b b a c b a c

1 2 0 3

Fig.4.AHCand2-3AHC

Duringourexperimentations onthe Webusage data presented in thenext

setion, wehaveused our simplied2-3AHC algorithmproposed in [2℄.Com-

paredwith theinitial 2-3AHC algorithm [1℄,our2-3AHCalgorithm's advan-

tages has a smaller omplexity (O(n 2

logn) instead of the initial O(n 3

)), and

(6)

3.1 Motivations

NowadaystherapidandontinuousgrowthoftheWebanbeaseriousobstale

forusersintheirinformationsearh,despitetheimprovementsofexistingsearh

engines. Personalizingthe web spae by analyzing users behaviour anredue

theirquestforinformationandhelpthemndmoreeasilywhattheyarelooking

for.Forthis,DataMiningtehniquesanbeusedforWebdataanalysis,inone

of the three main Web Mining domains: Web Content Mining, Web Struture

MiningandWebUsageMining (WUM).

TheWebusagedataused duringtheWUMproess aregenerallytheusers'

navigationalpathsgatheredinWebserverlogs,sometimesorrelatedwithinfor-

mationfromtheother WebMiningproesses,e.g.thesitestruture.Theusers

behaviouranalysishelpsintheWebsite(s)re-oneptionproessandalsofail-

itatesusersinformationsearh(throughdynamiinsertedlinks).

UserssearhingforinformationonINRIA'sWebsite,aretransparentlybrows-

ingthroughtheinteronnetedpagesofINRIA'sWebservers.Totraetheirbe-

haviour,wehaveanalyzedthelogaesslesfromtwoINRIAWebservers:the

nationalWebserver(http://www.inria.fr)andalsotheSophiaAntipolisresearh

unit Webserver(http://www-sop.inria.fr).

SineINRIA'ssientiorganizationintoresearhteamshasreentlyhanged

(1 st

April2004),wehavehosento studytheWeb usersbehaviourontwodif-

ferent15daystimeperiods:

-from01until15January2003,perioddenotedinthefollowingsasPer

1 ,

-andfrom27Mayuntil10June2004,perioddenoted asPer

2 .

Indeed, the main motivation of our study here wasto analyze the impat

of thehangesin theWebsitestruture (seeAppendix A),on usersbehaviour

whensearhingforinformation.Morepartiularly,ourstudyonernedthelus-

tering ofINRIAWeb sites'visited topis,using the2-3HierarhialAsending

Classiationpresentedintheprevioussetionandwasdoneintwophases:

-rst,thepreproessingof theWebaesslogs(based ontheworkof

Tanasaetal.[7℄)andpresentedinsetion3.2,

-seondly,thedatamining(usingour2-3AHCalgorithm)andtheresult

analysisphase(setion4).

3.2 Data preproessing

Inthis subsetion,wewillshortlyexplain thedata preproessingmethodology

proposedbyTanasaetal.(see.[7℄formoredetails).WeusedtheAxISLogMiner 1

tooldevelopedwithintheAxISresearhteamatINRIASophia Antipolis.

Theaimofthepreproessingphasewastoidentifyandextratuser naviga-

tions(sequenes ofuserations) fromthe rawWeb logs,and wasdone in four

steps:datafusion,data leaning, datastruturationanddatasummarization.

1

(7)

Duringthedata fusionstep,theWeblogsleswerejoined togetherforeah

analyzedperiod(resulting in logL

1

for Per

1

andlog L

2

forPer

2

),in order to

reonstruttheross-serverusers'navigations.

Thus thetwojoined logsontainedalltherequests (hronologiallysorted)

made by dierent users for dierent resoures on the two Web servers, over

the given periods of time. Some of these requests were made for non-relevant

resouresfromouranalysisviewpointandwereeliminatedin thedataleaning

step. For example, we do not interest ourselves in requests for images, sine

they usually are impliit requests(images ontainedin the aessed page [7℄).

Also,therequestsmadebywebrawlers(robots)havebeeneliminatedfromthe

weblogsforobviousreasons.Filteringoutallthese requestshasreduedL

1 to

11%and L

2

to 15%,from their original size. Forexample for L

2

, thenumber

of requestswasreduedfrom 4.473.228 to686.084,equivalentto alog lesize

redutionfrom901Mbto 135Mb.

Next,thedatastruturationstepgroupedtheunstruturedloglesrequests

byuser, user session,page view, and visit(navigations). Sinelog lesontain

onlytheomputerIPand(sometimes)theuseragent,we'veonsideredasauser,

theouple(IP,[UserAgent℄)andasauser sessionalltheationsperformedby

theuserovertheanalyzedperiod.

Thentheusersnavigationswereobtainedbysplittingeveryusersessionusing

a30minutesthreshold: 173.015navigationsforL

1

and 145.454navigationsfor

L

2 .

Finally,theobtainedlogleswerestoredinarelationaldatabaseinthedata

summarizationstep.

Advaned datapreproessing :

We performed ageneral data seletion step in whih we seleted from the

relationalDBthenavigations(usersvisits)toanalyzelater,usingthefollowing

riteria:

-navigationduration>60seonds,

-numberofrequestsinthenavigation>10,

-browsingspeed(duration/numberofrequests)>4.

Thishas reduedthenumberof analyzednavigationto 9625for L

1 and to

9309forL

2 .

Next, depending on the analysis, we performed seondary data seletions.

Forexample,in the seondarydata seletionassoiatedwith ourrst analysis

(setion 4) we deided to keep onlythe visits onbothINRIA's serversand to

lusterthevisitedrstleveltopis(fromthevisitedURLs).Thusthenumberof

analyzednavigationswasreduedto 3905forL

1

andto 3513forL

2

. Also,the

Web pagesreturnedby theWebserverwithanerrorstatusode( 400)were

ignoredinouranalysis.Wehavefoundatotalof190visitedtopisforPer

1 (78

wereresearhteams).ForPer

2

wefound210topisfromwhih86wereresearh

teams(49atualresearhteamsand37oldresearhteamsfromPer

1 ).

(8)

were assigned to dierent researh teams for later lustering. Sine INRIA re-

searhteamsorganizationhashanged(startingfrom1 st

ofApril2004),itsWeb

site struture hanged aordingly. The researh teams were reorganized from

thefourexistingresearhthemes,intovenewresearhthemes(AppendixA).

Wedeided to analyze theimpat of the Web site struture on users nav-

igations before and after this hange (during Per

1

and Per

2

). This wasdone

by performing twodierentanalyses of usersvisits: one on INRIA's rst level

topis andanotheronINRIA's researhteams.Fortheseondanalysis,sinea

uservisitisatuallyasetofvisitedURLs,weneededtodeterminewhihURLs

belongtodierentresearhteams.

EahURLanhaveseveraltopisassoiatedwithdierentsemantitopis:

http:==www sop:inria:fr

| {z }

=axis

|{z}

=personnel

| {z }

=Doru:Tanasa

| {z }

=doru eng:html

Site topi1 topi2 topi3

Thus in a rst step, a URL wasassigned to aresearh team when one of its

topis was the researh team itself. After this, omplementary information on

INRIA's Web site was used to assign URLs to researh teams. For example,

theURL:http://www.inria.fr/reherhe/equipes/axis.en.htmldoesnot

ontain any researh team topis, but is the AxIS researh team presentation

pagefromINRIA's mainserver.

Afterthedatageneralizationstepandinordertolustertheobtainedtopis,

weneededto ompute the dissimilarity matrix usedas inputfor theAHC and

our2-3AHC algorithms. Forthis, weused theJaardsimilarityindex onthe

visited topis as dened in [4℄. As in [4℄, werepresented eah navigation by a

binaryvetorofthevisitedtopis: thepositioni inthevetoris0iftopi

i was

not visited and 1 if topi

i

was visited during the navigation. Based on these

vetors and aiming to dene a similarity/dissimilarity between two topis R

i

andR

j

,wedenethefourfollowingquantities:

-aasthenumberofountswhenR k

i

=R k

j

=1,

-basthenumberofountswhenR k

i

=0andR k

j

=1,

-asthenumberofountswhenR k

i

=1andR k

j

=0,

-dasthenumberofountswhenR k

i

=0andR k

j

=0.

ThenthesimilaritybetweentwotopisR

i andR

j

isomputedusing:

S(R

i

;R

j ) =

a

a+b+

, whih represents the probability of visiting both topis

when at least one of them is visited. The dissimilarity matrix used as input

for thelassialAHCand our2-3AHC, wasomputedusing thedissimilarity:

(R

i

;R

j

)=1 S(R

i

;R

j ).

4 Results

Forourrstanalysis,wehavefousedontheresearhteamsdistributioninthe

server-rossedvisitedtopis.WehaveseletedfromPer

1

allserver-rossednavi-

(9)

epidaureSOP3B,o dysseeSOP3B,iare4A,miaouSOP4A, orion3A

epidaure3B,arianaSOP3B, revesSOP3B,miaou4A,

ariana3B hirSOP4A,omore4A,

aimanSOP4B

prismeSOP2B,prisme2B koalaSOP2A,koala2A, o dyssee3B,dreamSOP3A,

roapSOP2A,roap2A lemme2A,opaleSOP4B,

opale4B,ertilab2A,

pastis3B

orionSOP3A,aaiaSOP3A, oprinSOP2B,sagaSOP2B, sinusSOP4B,sinus4B,

aaia3A,axisSOP3A, saga2B smashSOP4B

orion3A,aidSOP3A,

aid3A

rob otvisSOP3B,rob otvis3B, mimosaSOP1C,mimosa1C, slo opSOP1A,slo op1A,

o dysseeSOP3B tikSOP1C,tik1C oasisSOP2A,oasis2A

ro deoSOP1B,ro deo1B, lemmeSOP2A,tropisSOP1A, mistralSOP1B,mistral1B

planeteSOP1B,planete1B masotteSOP1B,omegaSOP4B,

galaadSOP2B,afeSOP2B,

ertilabSOP2A

mefistoSOP4B,mefisto4B masotteSOP1B,masotte1B safirSOP2B,safir2B

meijeSOP1C,meije1C

Table1. INRIA'sWebsitetopislusteringusing2-3AHCforPer1

gations(visitingbothWebsites:main andSophia's),andthenlusteredallob-

tainedvisitedtopis using our2-3AHCalgorithm.Table1presentstherepar-

titionofonlytheresearhteamtopisintheobtainedlusters(theothertopis

arenotpresentedhere).Also,wedidnotrepresent,theoneelementlusters(the

\outliers"):aiman 4B, saga 2B, meije SOP 1C, sysdys SOP 4B,

hir 4A, afe 2B, odes 2B, visa SOP 4A, tropis 1A, omega 4B.

Weaddedafterthenameofeahresearhteam,theirthemeandsub-theme, as

wellastheirsite(emptyforthemainsite and\SOP"forSophia'ssite).

As we ansee the researhteams distribution usually orresponds to their

thememembership(16outofthe19non-triviallustersontainresearhteams

fromthesametheme).Also,oldresearhteamsthathavebeenreplaedbynew

researh teams, are in the same lusters as the orresponding new ones,sine

their pagesare stronglyinteronneted. Forexample: aidwasreplaedby axis

(luster7), rodeobyplanete(luster 13),et.

Fig.5.2-3Hierarhyontheme3projetsduringPer1

(10)

setion3.2,onlythose visitingatleastonepageonINRIA's mainserver.From

thesenavigations,wehooseto analyzethevisitedtopis(researhteams)only

from the main server pages and to luster only the researh teams topis for

theme3(Per

1

andPer

2

)andforthemeCog(Per

2 ).

Thus, we haverst seleted only those navigationsontaining at least one

visit of the theme 3 pages on INRIA's main server during Per

1

. From these

navigations,wehave lustered thevisited topis on the main serveronly, or-

respondingto researhteamsfrom theme3(Figure5). Intheresultinglassi-

ation we andistinguishtwomain lusters (82 and85) that groupalmost all

elementsfrom thetwosub-themesoftheme3(seeAppendix A).

Fig.6. 2-3 hierarhy ontheme 3 projets

duringPer2

Fig.7.2-3hierarhyonthemeCogprojets

duringPer2

Next,wehaveseletedfortheseondperiodthenavigationsvisitingatleast

oneofthenewthemespagesontheINRIA'smainserver.Fromthesenavigations

wehavefoused againon thetopis orresponding to researhteams from the

\old" theme3,only onthemain server'spages (f.Figure 6). Werefer in this

lassiationtothenewthemeBio,whihgroupstogetherinluster73thefour

(11)

theywereeitherreplaedorstopped,buttheirWebpagesarestillaessibleon

theInternet(i.e.sharp, opera,verso).

Finally,wehaveseletedonlythenavigationsthathadatleastonevisitofthe

newthemeCogpagesduringPer

2

.Wehavethenlusteredonlythetopisonthe

mainserverspagesorrespondingtoaresearhteamfromthemeCog(f.Figure

7).WenotethatusershavethetendenytovisitallCogAresearhteams,while

for the others sub-themes there is a ertain variability and a deeper analysis

is needed (possible auses: events like onferenes or seminaries presented on

INRIA's Websitethataets usersvisits,thereenthangeofstruture, et.).

Alsowefoundthatourresearhteam,AxIS,isgroupedinthetwotimeperiods

withdierentresearhteams.

Fig.8. Classial hierarhy on theme

Cogprojets Fig.9.2-3hierarhyonthemeCogprojets

Inour nal analysis we haveompared the lassialAHC method and our

2-3AHC algorithm, bylustering the researhteams from themeCog (during

Per

2

).Thedataseletionwasthesameasin thepreviousanalysis:navigations

visitingatleastoneofthethemeCogpages,topisonlyfromthemainserver's

pagesandrepresentingresearhteams fromthemeCog.

Figures8and9presentapartialoutput (ontainingall3B researhteams)

of thelassialAHC respetivelyofour2-3AHC algorithm.The 2-3hierarhy

obtainedontainsmorereatedlustersthanthelassialhierarhy(22against

15),and thus moreinformation.Forexample,analyzingluster 54in Figure9,

wean saythat researhteamsariana, epidaureandodysseehavea\stronger"

probabilityofbeingvisitedtogether,omparedwiththeonegivenbythelassial

(12)

Thispaperpresentstherstappliationofa2-3AHCalgorithmonWebusage

data,andshowsthepotentialsofouralgorithmomparedwiththelassialAHC

algorithm. Inourstudy, weinterested ourselvesin lustering thevisited topis

ofINRIA'sWebsite,whihreently(1 st

April2004)hangeditsstruture.

WehavestudiedtheimpatofINRIA'sWebsite strutureonusersnaviga-

tions,duringtwotimeperiods(beforeandrightafterthesitestruturehange).

Althoughtheseondanalyzedperiodwasshortlyafterthehange,wehavefound

that usuallyusersnavigationsareinuenedbytheWebsite struture.

Ourongoingandfutureworkonernthefollowingtopis:

-deeperanalysisoftheomparisonbetween2-3AHCandAHConsame

Webusagedata,usingdierentaggregationlinks,

-abetterdissimilaritymeasure.ForexamplethegeneralizedJaardin-

dex,whihtakesintoaountthenumberofvisitedpagesforatopi,

notjustitspresene(ountvs.binary),

-appliationofour2-3AHCalgorithmonotherdatainferred fromthe

ativitiesreportsofINRIA'sresearhteams,andomparisonofthe

resultswiththeonesobtainedhere.

Aknowledgements:WewouldliketothankMihaiJuraforhishelponthe

preproessingphaseand Sophie Honnoratfor herhelp withtheenglish orre-

tions.

A INRIA researh teams organization

Before1 st

ofApril2004,INRIA'sresearhteamswereorganizedinfourdierent

researhthemes,namely:

-Theme1:Networksandsystems:

-A: ArhiteturesandSystems,

-B :NetworksandTeleommuniations,

-C :DistributedandReal-TimeProgramming.

-Theme2:Softwareengineeringandsymboliomputing:

-A: SemantisandProgramming,

-B :Algorithmsand ComputationalAlgebra.

-Theme3:Human-omputerinteration,imagesproessing,datamanage-

ment,knowledgesystems:

-A: Databases,KnowledgeBases,CognitiveSystems,

-B :Vision,ImageAnalysisandSynthesis.

-Theme4:Simulationandoptimizationofomplexsystems:

-A: Control,Robotis,Signal,

-B :ModellingandSientiComputing.

Afterthisdate,theresearhteamswerereorganizedinthefollowingveresearh

(13)

-A:Distributedsystemsand softwarearhiteture,

-B :Networksandteleoms,

-C :Embeddedsystemsand mobility,

-D :Arhitetureandompiling.

-ThemeCog:Cognitivesystems:

-A: Statistialmodelingandmahinelearning,

-B :Pereption,indexing andommuniationforimagesandvideo,

-C :Multimediadata:interpretationandman-mahineinteration,

-D :Imagesynthesisandvirtualreality.

-ThemeSym:Symbolisystems:

-A: Reliabilityandsafetyofsoftware,

-B :Algebraiandgeometristrutures,algorithms,

-C :Managementandproessingoflanguageanddata.

-ThemeNum:Numerialsystems:

-A: Controlandomplexsystems,

-B :Gridsandhigh-performaneomputing,

-C:Optimizationandinverseproblemsforstohastiorlarge-salesystems,

-D :Modeling,simulationandnumerialanalysis.

-ThemeBio:Biologialsystems:

-A: Modelingandsimulationinbiologyandmediine.

Referenes

1. P.Bertrand. Setsystemsfor whiheahsetproperlyintersets atmostoneother

set-Appliation toClusterAnalysis. ResearhReportCeremade0202,Universite

Paris-9,Frane,2002.

2. S.Chelea,P.Bertrand,andB.Trousse.Agglomerative2-3HierarhialClustering:

theoretialimprovementsandtests. In27thAnnualConferene oftheGesellshaft

fur Klassikation,Cottbus,Germany,12-14mars2003.

3. S.Chelea, P. Bertrand,andB.Trousse. UnNouvelAlgorithme deClassiation

Asendante2-3Hierarhique.InReonnaissane desFormesetd'IntelligeneArti-

ielle (RFIA 2004), Centrede Congres PierreBAUDIS,Toulouse,28-30 Janvier

2004.

4. A.ElGolli,B.Conan-Guez,F.Rossi,D.Tanasa,B.Trousse,andY.Lehevallier.Les

artes topologiquesauto-organisatries pourl'analyse des hiers logs. In11emes

Renontre de laSoiete Franophone de Classiation, Bordeaux,8-10 septembre,

2004. toappear.

5. A.R. JOHNSONand D.W. WICHERN. Applied Multivariate Statistial Analysis,

hapter12. PrentineHall,1982.

6. R.R. Sokaland P.H.A. Sneath. Priniples of numerial taxonomy. Freeman, San

Franiso,1963.

7. D. Tanasaand B. Trousse. Advaneddata preproessing for intersites webusage

mining. IEEEIntelligentSystems, 19(2):59{65,Marh-April2004.

Références

Documents relatifs

Concept lattices of RDF graphs In [10], in addition to the RDF graph, a formal context corresponding to the background knowledge is considered. This context has a set of resources

In this paper we present a novel approach towards ephemeral Web personalization consisting in a client- side semantic user model built by aggregating RDF data encountered by the user

Situation: Tu viens de rentrer de vacances et tu téléphones à ton correspondant anglais pour savoir comment se sont passées les siennes et expliquer les tiennes.. Tâche :

 Pendant ce temps, les moutons font ( les mignons / les fous / les

LECTURE : Les deux moutons de tonton Léon Date :..  Les moutons font les fous dans ( le jardin / le

Les piquets ne sont plus dans le sac alors papa utilise des branches pour monter la tente.. La camionnette bondit sur

The EFS method provides eight different techniques for feature selection in bi- nary classification: Since random forests [6] have been shown to give highly accu- rate predictions

Concept lattices of RDF graphs In [10], in addition to the RDF graph, a formal context corresponding to the background knowledge is considered. This context has a set of resources