• Aucun résultat trouvé

Clustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecules

N/A
N/A
Protected

Academic year: 2021

Partager "Clustering Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small Molecules"

Copied!
17
0
0

Texte intégral

(1)

HAL Id: hal-00636849

https://hal.archives-ouvertes.fr/hal-00636849

Submitted on 9 Nov 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Clustering Libraries of Compounds into Families:

Asymmetry-Based Similarity Measure to Categorize

Small Molecules

Samuel Wieczorek, Samia Aci, Gilles Bisson, Mirta Gordon, Laurence

Lafanechere, Eric Maréchal, Sylvaine Roy

To cite this version:

Samuel Wieczorek, Samia Aci, Gilles Bisson, Mirta Gordon, Laurence Lafanechere, et al.. Clustering

Libraries of Compounds into Families: Asymmetry-Based Similarity Measure to Categorize Small

Molecules. N.S. Yang. Bioinformatics - Computational Biology and Modeling, InTech, pp.229-244,

2011, 978-953-307-875-5. �hal-00636849�

(2)

1

1

1

Chapter Number

Clustering Libraries of Compounds into

Families: Asymmetry-Based Similarity Measure

to Categorize Small Molecules

1

2345678491ABCD4E

F

1531ABC3B



13713EE4



1787138B



11

BB45841BD8454



1B8 5BE1!835

"

1#7$1A$E%B34

"

1

123456375869AB9AC85D5E89AFA63B9A9DD9A8AA6954D9A1A69AB9A367 6A !"#$%A6954D9AA&639'A (9769AB9AC85) 8*9A5D+D3869A,-A.A%!#1A69A36D9A3B65A %$#1A/6D+3A&639'A !23456375869A98*9AB9AD01E+89689A+B83D9A97AB9AD3A52)D9387+A4A15562378*9A 37+2378*9A97A))D83785AB9A6954D9A,A$$($A.A4A,869687+A759)A&56896A 6954D9A1A852389AB9AD3A968A!"1#A23A659A&639A %17877AD4967AC5857A1A1.A,"(!A4A,869687+A759)A&56896A6954D9A1A 5B4)587AB9AD3A37569A!"#9A23A659A&639A $23456375869AB9A- 85D5E89A9DDD3869A97A+E+73D9A,A$19"A.A:AA:A1.A4A ,869687+A759)A&56896A6954D9A1A8AA6954D9A1A69AB9A367 6A!"#$%A 6954D9A&639'A

1

1. Introduction 1 1

1 CBEE1 54C35BE1 57C&7D1 '351 &45335BEE$1 B53%B41 781 33(31 B1 )3%41 (37E7)35BE1 $4C15B1(457C41B1E4B15B3B4131B18D)1357%48$1&48&453%41781B1C7E45DEB8177E1781 (37E7)35BE1 844B851 *A759'4EE1 +++,1 B$481 ++,1 -)E441 97A 3D'1 ++.,1 B8 5BE1 ++/011 241 34335B371 B1 &87D5371 71 D51 57C&7D1 31 D1 B1 CB3781 3D41 781 (37E7)35BE1 781 48B&4D351 844B8511 A8B4)341 781 CBEE1 C7E45DE41 357%48$1 84E$1 EB8)4E$1 71 3)1 87D)&D158443)1*42A017154C35BE1E3(8B8341'351B18B337BEE$1(44141&D8%34'1 71 3D8$1 781 41 &B1 '4$1 $4B81 B1 B1 (457C41 8454E$1 B%B3EB(E41 31 B5B4C351 33D371 *A431 ++,1 5761 97A 3D'1 ++7011 AD51 3)1 87D)&D1 B&&87B541 D41 87(7351 BE3)1 71 C33BD83641 (37E7)35BE1 BB$1 B1 BEE7'1 41 58443)1 71 B1 EB8)41 DC(481 71 57C&7D1714E451741*5BEE4183901B1&87D54141'B41B184&87D53(E414451 71 B1 )3%41 (37E7)35BE1 B8)41 *9'E'1 B1 46$C41 781 B1 '7E41 54EE011 241 3641 71 B%B3EB(E41

(3)

57C&7D157EE453717158441318B481EB8)4:17813B54141;4C<B%3)B78=1BB(B41 '351 &87&741 57CC4853BEE$1 B%B3EB(E41 58443)1 57C&7D1 87C1 348B37BE1 54C35BE1 D&&E3481 5D884E$1 8B591 7%481 71.1 C3EE371 54C35BE1 BC&E411 C7)1 4C1 7%481 1>1 C3EE371B8415EB3C4171(41D3?D41147'4%481D51B1BC7D1313EE1CBEE184EB3%4E$17141 3641 71 41 54C35BE1 &B54:1 41 DC(481 71 $436B(E41 57C&7D1 31 43CB41 71 8B)41 87C1F+F/171F+++157C&7D1*@B89481B1A584$481++011A414158443)171B1%48$1EB8)41

54C35BE1E3(8B8$15B1(413B53BEE$146&43%413C4157DC3)1B141BC7D171(37E7)35BE1 CB483BE1 4441 C3)1 (41 3C&E$1 71 84BE33511 37E7)31 CD1 741 E7'481 4381 BC(3371 B14E451B1E3C341DC(48171C7E45DE4171BB$1124143)17184E4%B154C35BE1E3(8B8341 7415BEE41857841E3(8B8349135414$1B841D&&74171B55D8B4E$184E451413%483$171B1 %48$1EB8)4157EE45371*BD(73197A3D'1++/0131D1B1548BE13D4178158443)11241BD7CB351 5ED483)17154C35BE157C&7D15B1)448B417C7)47D1D(41(B4171B13C3EB83$1 C4BD841B1BEE7'1B18B37BE41433371781B157841E3(8B8$1*23EE41F>>/011 1

-1 B1 (441 4C78B41 B1 8D5D8BEE$1 3C3EB81 57C&7D1 B841 %48$1 E394E$1 71 B%41 B1 3C3EB81(37E7)35BE1B53%3$1*B83197A3D'1++0112D1717E$1415ED483)17DE1BEE7'141 34335B3717157C&7D1D(4187C1'35174184&844B3%41C7E45DE415B1(414E4541 71(41584441(D131C4715B1BE714E&1713584B41413%483$1713C7D41BB141 ($14E453)1B337BE157C&7D13174813433415ED48171C7E45DE4117847%48131B1 (441 7'1 B1 31 57DE1 (41 D4DE1 71 4E451 C7E45DE41 '351 57C41 87C1 4B51 5ED481 57B33)1B1839157C&7D,1741884EB49157C&7D17DE1(4144187D)1D8481 %BE3B37158443)1B)41*!)4E197A3D'1+++0111

;ED483)1 5B1 BE71 (41 D41 31 B1 %38DBE1 58443)1 B&&87B51 71 4E451 84E4%B1 %38DBE1 E3(8B8341*&8378171B17593)1&875417813B5401781714E45141C7841&87C33)1B13%4841 C7E45DE41*B481B17593)1&87540171(414418A6876511 1 ;4C35BE157C&7D15ED483)1E3941B$17(34515ED483)13C&E3417D814&1*B7'1B1 B8B81++0:1 *F0 -4335B3717184E4%B14583&7817814417(345,1 *0 A4E45371B157C&DB37171B13C3EB83$1*781B13B5401C4BD84,1 *0 D4171B15ED483)1BE)783C171)B4817(3451B55783)171313B5417813C3EB83$,1 *0 BE$31B1?DBE335B371714184DE11 1 7E45DE41B8418D5D8BEE$157C&E4617(345,1313148478417(%37D1B1415ED483)1?DBE3$1 84E341 87)E$1 71 41 5B&B53$1 71 41 3B541 C4BD841 71 4C(8B541 (71 41 8D5D8BE1 E39441B133C3EB833411-1315B&481'4175D17141435345$171B1BB&B371781 CBEE1 C7E45DEB81 7(3451 71 B1 3C3EB83$1 3461 333BEE$1 &87&741 31 -D53%41 7)351 @87)8BCC3)1 *2345678491 97A 3D'1 ++7011 241 57C&B841 31 7%4E1 C471 '31 7C41 7481 8D5D8BE13B541B1B8415D7CB8$13154C38$1781'351B%41(4418454E$1&87&741 781C7E45DEB81)8B&157C&B837111

1

2. Methods for the computation of structural distances between molecules

24157C&DB371718D5D8BE13B541(4'441C7E45DE41*84&84441($1)8B&013845E$1 781 33845E$1 3C&E341 41 4B851 781 37C78&351 &B83BE1 )8B&11 448BEE$1 C471 D41 B1 E34B836B371*A-!1EB)DB)4187C12433)481F>//01781B18D5D841&87&7337BE36B37171 41 57C&7D11 2D1 B1 C7E45DE41 31 84&84441 ($1 B1 %45781 71 4583&781 4B51 741

(4)

57884&73)171B1C7E45DEB818B)C41*4B51B13EE41++011#454E$19484E1D5371 57C&B8B(E41 71 3B541 (4'441 )8B&1 *B8481 97A 3D'1 ++01 B%41 (441 &87&741 31 41 AD&&781 E45781 B5341 *AE01 574611 24$1 &8441 )771 &4878CB541 31 D&48%341 CB5341 E4B83)1 71 &84351 C7E45DEB81 (37CB53%3$1 *B 1 97A 3D'1 ++"01 781 71 7E%41 (37378CB351 &87(E4C1 *45431 97A 3D'1 ++"011 -1 441 B&&87B541 C7E45DEB81 84&844B37131)E7(BE:1B14171&B1*8'9'1C7E45DEB818B)C41&45335BEE$157417818B'1 ($15B540131(D3E146&E353E$17813C&E353E$11-131BE71&73(E4171%BED418D5D8BE13B541($1 $BC35BEE$1 (D3E3)1 C7E45DEB81 8B)C41 B55783)1 71 41 CB53)1 (4'441 '71 C7E45DE411 231 B&&87B51 31 &87&741 ($1 58FE351 97A 3D'1 *++"01 31 B1 7C5BEE41 8)E7(BE1 CB53)91 9484E1 71 &84351 (37CB53%3$11 241 75D1 4841 71 B1 3C3EB81 8B4)$1 '31 B1 3C3EB83$13461-&31(B41714157C&B837171EB(4E418441781D(8D5D841B1BEE7'141

5EB335B371 71 C7E45DE41 31 B1 DD&48%341 E4B83)1 CB5341 B&&87B511 1 781 4583&37171419484E1D5371D41313157C&B8B3%41D$1B1B144&48146&EBB371 7141&8353&E417141-&313C3EB83$13461B84146&74131417EE7'3)1&B811

1

2.1 Kernel functions

G484E1 D5371 B841 B1 41 (B31 71 CB5341 E4B83)1 C471 D3)1 AD&&781 E45781 B5341*AE01B&&87B54112441D5371BEE7'1'7893)171333BE1BB1B1314$1'4841 31B13)C3C437BE1&B541'37D1B%3)1718B78C14C146&E353E$,1C7847%4819484E1 C471BE417CE34B8157C&E461B91D3)1E34B81C47131314'1&B5411241CB31 B%BB)4131B141BB1CB$1(41C78414B3E$14&B8B(E4131B13)C3C437BE1&B541*41 7C5BEE41 84BD841 &B54901 B1 31 41 783)3BE1 &B5411 231 83591 31 B1 41 (B31 71 9484E1 CB53411AE1B19484E1D5371'48414B3E41($1AB'4C2B$E781B1;833B331*++01 B1A5FE97&1B1AC7EB1*++0111

1

2.1.1 Tanimoto kernel1

231 9484E1 *#BEB3%7EB1 97A 3D'1 ++"01 31 41 8B78CB371 71 41 5EB35BE1 2B3C771 3B541 *23EE41F>>/,15E7'481F>>/01371B19484E1D537117E45DE41B841441B1%45781'48414B51 3C437131B753B41'31B1)3%41C7E45DEB818B)C41B14157783B41335B413131 8B)C41 4631 781 71 31 41 C7E45DE411 271 (D3E1 441 %45781 31 31 454B8$1 71 )3%41 41 CB63CDC1E4)171415734841C7E45DEB818B)C4112315B1(414341($1BEE7'3)141 4E45371 71 &B1 87C1 E4)1 11 71 B1 CB63CDC1 1 781 BEE7'3)1 41 4E45371 71 &B1 71 B1 46B51E4)1D11

2.1.2 Weighted Decomposition kernel (2D-WD kernel)

-1 31 9484E1 *45431 97A 3D11 ++"01 C7E45DE41 B841 84&84441 ($1 41 41 71 BEE1 &73(E41 D()8B&1'3515B1(41(D3E1781B1)3%41CB63CDC14&112419484E1D5371(4'441'71 C7E45DE4131B1 1'43)14146B519484E1(4'4414B51&B38171B7C1;38A <=1B55783)171

418D5D8BE1378CB371123174157884&717141D()8B&1B157B31BEE141&B1 714&1B1(D3E187C14B51B7C1381B1 <11

2.1.3 Optimal Assignment Kernel (OA Kernel)

241 H&3CBE1 3)C41 G484E1 *58FE351 97A 3D'1 ++"01 31 (B41 71 B1 $BC35BE1 B1 E75BE1 )8B&1 46&E78B3711 DE3941 41 2B3C771 G484E1 41 84EB37BE1 8D5D841 71 C7E45DE41 31 5E4B8E$1 5748%41 31 3184&844B3711 2419484E1 57C&DB371 31 3%341 371 '71 4&1 B1B8415754&DBEE$15E7417141741&87&741($1371*F>>"0112413814&14%BEDB41B1

(5)

3B541 (4'441 4B51 B7C1&B381 ;38A4<=1 87C1 '71 C7E45DE41 1 B1 C1 B91 71 419484E1

D5371 BC41 >981 B1 B941 371 B557D1 41 '31 ?1 71 4B51 B7C1 43)(783)11 241

45714&1CB541B7C1381*87C101'31B7C14<1*87C1C01317848171CB63C364141>98A

;38A4<=1DC1'351BC7D17173)1B1CB63CDC1'43)1CB53)131B1(3&B8341)8B&11

1

2.2 Structural similarity Ipi Index

2313C3EB83$1346131B1BB&B37171413461&87&741($1371*F>>"01B12345678491 97A3D'1*++7017154C35BE18D5D8411

1

2.2.1 General principles

!B51C7E45DE411314583(41B1B17C783441)8B&14341($1B1&B381;A2=1'484:1

• 157884&717141B7C1@31AAA3B171C7E45DE41C1

• 2157884&71714157%BE41(71(4'441441B7C1@D1AAAD)B'1 11 B$13C3EB83$157435341DDBEE$1D413157C&DB37BE154C38$1B841(B4171413641 *31 DC(481 71 B7C01 71 41 B63CDC1 ;7CC71 AD(8D5D841 *;A01 (4'441 '71 C7E45DE411B1I1*D941B1A4B8481F>>/011178146BC&E41413C3EB83$1AF1'35131 4341B14184EB3%4136417141;A157C&B841714136417141(3))41C7E45DE41:1 S1(M , M ')= MCS(M , M ') max M , M '

(

)

1 *F01 1

241 3C3EB83$1 57435341 A1 &87&741 4841 31 BE71 (B41 71 41 ;A1 (D1 31 4341 B1 41 C4B171'71DBE1B$CC48351%BED411.A;A0=1B11.A;0A=1:1 1 S(M , M ')=1 2

(

Ipi(M , M ')+Ipi(M ', M )

)

1 *01 1

'4841 41 %BED4A 1.A ;A 0=1 31 41 84EB3%41 3C3EB83$1 71 1 7'B81 01 8'9'1 41 4)8441 71 35ED371 71 1 371 011 -1 31 4341 B1 41 84EB3%41 3641 71 41 ;A1 (4'441 1 B1 01 57C&B8417141364171C7E45DE41I:1 Ipi(M , M ')= MCS(M , M ') M ' 1 *01 1 231 83591 BEE7'1 57C&B83)1C7E45DE41 B%3)1 (3)1 3484541 31 36411 5781 3B541 31 1 31 CBEE481B101B11314B8E$135ED41310187C141&73171%34'171141C7E45DE41 01 31 %48$1 3C3EB81 3541 31 57B31 41 BC41 378CB371 B1 '41 B%41 1.A ;A 0=A≈A 11 '484B1B15EB35BE1$CC483513C3EB83$1'7DE184E4513134845413136411241D417141 C4B1 71 (71 35ED371 %BED41 *1.A ;A 0=1 B1 1.A ;0A =01 E4B1 71 B1 C7841 84BE3351 3C3EB83$1C4BD841B1BEE7'1(84B93)1413641(3B1B175D3)144&4817141463454171 57CC71D(8D5D841B17'17153)D841F11

(6)

1 1

53)11F112344565785195AB5571C18DCEE38CD1E3F3DC63A1F5CE651C71C71CEFF5A6389CE51751-1 31 )8B&1 11 *&EB31 E3401 31 B1 5EB35BE1 3C3EB83$1 C4BD841 B1 1 *B41 E3401 B1 7C

$CC4835C(B41 7411 231 )8B&1 7'1 41 (4B%37D81 71 (71 1 *4?DB371 01 B1 111

*4?DB371 F01 (4'441 B1 C7E45DE41 1 71 3641 F"1 B1 B7481 C7E45DE41 21 71 3584B3)1 364:1 D1+1"J*>KF"L>KMCM01 B1 1*1 201 N1 >KCB6*F"1 M2M011 241 3641 71 41 ;A1 31 >11 H41 5B1

7(48%41 B1 11 84CB31 57B1 781 BEE1 C7E45DE41 '351 3641 31 E7'481 B1 MM11 -1 7481

'781 41 3C3EB83$1 AF1 577D1 741 C7E45DE41 B1 44C1 BEE1 3C3EB81 '31 84&451 71

4381364,1'3513171415B417811(45BD4148413171&EB4BD11 1

2414)84417135ED3711.A;A0=13157C&D4131'714&1B1E75BE1741B1B1)E7(BE17411 241381E75BE14&1B3C1B157C&D3)1B1E75BE13C3EB83$1(4'4414B51&B38171B7C1;38A30)=A

(4E7)3)184&453%4E$171C7E45DE411B1I11241%BED41B841784131B1CB83615BEE41,C,1

,CE38A30)F13131O#1P11241BC41&87543)1317EE7'41781017'B811B14184DE1B841

784131B1CB8361BC41,C01124194$134B13171573481B1'71B7C1381B130)1B841C71

3C3EB81314$1B84157CC71&$35754C35BE1&87&48341(D1BE71314143)(783)1B7C1 71 '351 4$1 B841 574541 ($1 57%BE41 (71 B841 4C4E%41 3C3EB81 71 4B51 74811 231 845D83%41 433371 BEE7'1 46&843)1 41 &87(E4C1 31 41 78C1 71 B1 7CE34B81 4?DB371 $4C,141847ED3717131$4C1573131414B85171B13641&7311 1 241)7BE1714145714&1317157C&D41B1)E7(BE135ED371(4'441(71C7E45DE411B1 018'9'141%BED41711.A;A0=114B%3)1E75BE13C3EB83341%BED417814B51B7C1&B381;38A30)=1 B1B55783)1 71 41 8D5D8BE1 57453%3$1 71 1 '41 4B851 41 CB53)1 B1 CB63C3641 41)E7(BE135ED371'3515B1(41B&&8763CB41($141(3))4157CC718441781D(8D5D841 (4'44141'71C7E45DE411H541(711.A;A0=1B11.A;0A=1B84157C&D41417BE1 3C3EB83$1(4'441(71C7E45DE413141C4B171(71%BED411231C4B13141D41($141 5ED483)1BE)783C11 1

2.2.2 Computation of local similarity (between atoms)

241B3C1317157C&D4141%BED41714B514E4C417141CB8361,C1'35157884&717141 E75BE1 3C3EB83$1 (4'441 741 B7C1 381 71 1 N1 ;A 2=1 B1 741 B7C1 30)1 71 01 N1 ;0A 20=1 *441

46BC&E41313)D841011-1?DB33414135ED3714)844171414%387C417141B7C138A131

(7)

12:1 Q0→O#1P1 41 3C3EB83$1 (4'441 '71 B7C1 B55783)1 71 4381 84&453%41

&$35754C35BE1&87&4834,1

131 :1 2Q20→O#1P1 41 3C3EB83$1 (4'441 '71 57%BE41 (71 B55783)1 71 4381 84&453%41

&$35754C35BE1&87&4834,1

1:1Q2Q0Q20→O#1P1413C3EB83$1(4'441'71&B381*B7C1(7D0,11 456789AB27AC141DC(4817157%BE41(7171B1)3%41B7C138,1

6789DEF1:11→121B1D537184D83)1781B1)3%41B7C138141E31RD11S11D2T1714157%BE41

(717138,1

451:1 Q21→1 1 B1 D5371 '351 )3%41 781 B1 )3%41 B7C1 381 B1 B1 )3%41 (7D1 D21 41

43)(7D83)1B7C13<1'3513157454171381($1D211

1

24135ED3714)844131784131,CE38AA30)AF11-157C&DB37157C417'171(D3E1B1$4C1

717CE34B814?DB371'4841,CE38AA30)AF1317417141%B83B(E417157C&D411241847ED371

71 31 $4C1 31 7(B341 ($1 D3)1 41 UB57(3I1 348B3%41 C4711 481 4B51 348B371 '41 B%41417EE7'3)14?DB37:1

1

A1*B31EC11BI&11EI01N1FK×*ADO<)(81*B311EC01<)(81*BI&11EI0P1L1AE1*EC111EI1001 *01

ADOB311BI&P1N1FK1*AB1*B311BI&101L1B6B5A57841K1<(391*B31001 *"01

1

33375691 31 57C&D41 B55783)1 71 41 7EE7'3)1 &87543)11 41 D1 4341 41 D5371 &8BG33G3))8E1 *5011 5781 '71 )3%41 B7C1 381B1 30)1 51 4B8541 41

7&3CBE1 CB&&3)1 (4'441 41 43)(7D81 71 381B1 30)1B1 71 (4'441 41 57884&73)1

57%BE41(711&184D81415784171317&3CBE1CB&&3)1B1313337569111 1

231 D5371 31 B1 5EB351 &87(E4C1 71 CB53)11 -441 E41 21 *84&11 2001 (41 41 E31 71 41 57%BE41(7131'351B&&4B8141B7C1381*84&11130)A01171'351B8414E4C417121B1

201 5B1 (41 5734841 B1 B1 (3&B8341 )8B&1 4E4C4,1 B91 71 141 3C3EB83$1 71 4B51

?DB8D&E41*381D21130)11D07013197'112D133)141(41CB53)1(4'4414414E4C41 (73E17'171CB63C364141DC171411%BED418'9'1717E%41B1CB63CDC1'43)1CB53)131 B1(3&B8341)8B&11333756914157884&73)1CB53)157841314?DBE17141DC171 411%BED411 1 31 1FFC6141A513746FCA371E51A18FC651AB13571CAFE1371C1FD58D51 1

271 31 31 7&3CBE1 CB53)1 '41 D41 41 4D)B83B1 BE)783C1 BE71 5BEE41 41 GDC D9841 BE)783C1 *GD1 F>"",1 D9841 F>".01 '741 57C&E463$1 31 /;!=11 231 31 71 B1

&87(E4C13541E3121B1201B8418B481CBEE:14381364157884&7171415734841B7C1 %BE4541*7813B5411781415B8(71B7C01112D141E75BE13C3EB83$1(4'441'71B7C131 4341 845D83%4E$1 B1 41 783)3BE3$1 71 31 B&&87B51 E341 31 41 B51 B1 131 57C&D41

(8)

B7C157884&717141B%48B)41714381&$35754C35BE13C3EB83$131B14178CBE3641

B%48B)4171413C3EB83$171438143)(7D83)1*44153)D841011 1

-14?DB371*01333756913178CBE3641($1.428H;38=11231)3%4131B$CC48351BD841

71 ,CE38A A30)F1 '484B1 B1 3%3371 ($1 33A ;.428HA ;38=AA .428HA ;30)==1 '7DE1 B%41 94&1 B1

$CC48351 BD841 781 31 3C3EB83$1C4BD8411 587C1 B1 &8B535BE1 &731 71 %34'1 '41 D41 41 UB57(3I1348B3%41C47131B1$5877D1'B$18'9'1'41D41'713B5417141CB8361,C1 741 781 41 348B371 81 B1 741 781 41 348B371 8I1:1 BEE1 41 %BED41 71 41 CB8361 ,C8I11 B841

57C&D41D3)14148C17141CB8361,C817141%BED41781BEE1B7C1B8413CDEB47DE$1

5B)41'41,C8I113157&34131,C811241DC(48171348B3715B8B5483641414&171

41 378CB371 &87&B)B3718'9'1 41 43)(7D83)1 3641 B941 371B557D1 71 57C&B841 '71 B7C1153)D84117'1B146BC&E417131&87&B)B3711

Fig. 3. Computation and evolution of the matrix SUB. Example with the molecule M being formamide (CONH3) and the molecule M’ being glycol-aldehyde (C2H4O2).

1

-1 5B1 (41 &87%41 B1 41 $4C1 BE'B$1 B1 B1 7ED371 B1 B1 31 31 7D1 B481 4'1 348B371 *87C1 !1 71 "1 B55783)1 71 41 57C&E463$1 71 41 C7E45DE4011 241 378CB371 &87&B)B3714584B41B11 n

( )

+12'4841131413B541(4'44143)(7D8111

(9)

241 D3)1 41 4D)B83B1 BE)783C1 41 7%48BEE1 57C&E463$1 781 41 57C&DB371 71 41 ,C13131/;>'('8!=1'484:1 • 813141C4B1DC(4817143)(7D817814B51B7C13141C7E45DE4,1 • >13141DC(48171348B3717141348B3%41&8754D84,1 • 13141C4B1DC(48171B7C13141C7E45DE411 23157C&E463$1CB$1(4184D54131/;>'('8(=1318131E7'1($1&84C57C&D3)1BEE141&73(E41 CB53)1(4'4414C11 1 -1 31 '7891 71 B%41 7C7)443$1 '31 41 7481 57C&B841 C471 *441 !6&483C4BE1 CB483BE1B141C477E7)$0141C7E45DEB8184&844B3713194&1C33CBE:1 1314&417E$17141$&4171B7C1*;1H1<14510:1

1 3A;ai, a’j =1N11131B7C1$&41B8414?DBE113A;ai, a’j =1N1+1748'3411 D14&417E$17141$&4171(71*3C&E417D(E4183&E41B87CB350:11 DA;E31EI31=1N11131(7D1$&41B8414?DBE1DA;E31EI31=1N1+1748'3411 1 <741 BA 31B1 D1 B841 )448351 D53711 -1 '7DE1 41 (41 %48$1 4B$1 71 34)8B41 C7841 57C&E461&87&48341781B7C1B1(71D51B15B8)413461&B8CB57&7841&7314511 73$3)131B1D1'7DE1(41D35341'37D1B$15B)413141)E7(BE1BE)783C11 1

2.2.3 Global similarity (between molecules) computation

-1784817157C&D4141)E7(BE13C3EB83$1(4'441'71C7E45DE411B1011.;0=11'41 4B851 781 41 (41 CB53)1 (4'441 B7C1 87C1 1 B1 011 231 CB53)1 84E341 71 41 E75BE1 3C3EB83341 7841 31 CB8361 ,C1 B1 31 CB63C3641 41 )E7(BE1 35ED3711 231 5B1 (41 B534%41 ($1 D3)1 41 4D)B83B1 BE)783C1 B1 31 /A >969D1 *58F4E351 97A 3D'1 ++"01 573483)1B1'41B%41B1(3&B8341)8B&1(D3E1'31B7C171C7E45DE411B1011A3541 E75BE1 3C3EB83341 5B1 57884&71 71 34841 CB53)1 4$1 71 71 )DB8B441 B1 41 CB63CDC157CC718D5D8417D1($141BE)783C1'7DE1(41B1574541741147'4%481 313171B184BE1&87(E4C13154C38$13541B&&E35B371D51B1E4B1357%48$1781$431 43)1C3)1&D1B1&84C3DC171D5745418D5D8BE17ED37111 1 241 CB53)1 31 4847841 4B8541 B55783)1 71 41 7EE7'3)1 4D833511 241 (41 57841 71 E75BE13C3EB83341(4'441B7C1B1'415B1313141CB8361,C131B941B1B14411241 CB53)13141&87&B)B41B55783)171418D5D84157453%3$1711B1B55783)171 41,C1%BED411-14146BC&E417'13153)D84111B101B841'71C7E45DE41B7C1V11AJ1 B1V13AJ1B841B941B144117C1V1(AJ1B1V1$AJ1*43)(7D8171V11AJ01B841&87544171(41 CB541 '31 V141W1 B1 V191W1 *43)(7D81 71 V131W011 24381 CB53)1 5B1 (41 &875441 B55783)1 71 B1 )844$1 BE)783C1 *&B383)1 B55783)1 71 B1 4584B3)1 8B93)1 71 3C3EB83$1 %BED401781B55783)171B1GD1BE)783C11-1417EE7'3)146&483C41'41B%415741B1 )844$1 BE)783C1 781 31 CB53)11 241 V1(1W1 B1 V1$1W1 B841 &B3841 V1!1W1 *43)(7D81 71 V1(1W01B1V151W1B1V1E1W1*43)(7D8171V191W01B841&875441B171711231CB53)17&1 '414841B84171C7841B7C171CB51311781B1771B1B1B7C17115B71(41CB541 '31B1B7C17101141D17417'1B1:1 F1 B7C1V1!1W1*43)(7D8171V1(1W0131&875441(47841V1%1W1*43)(7D8171V1$1W01(45BD41 V1(1W1 B1 (441 7841 (47841 V1$1W1 31 7D81 3C&E4C4B371 8D5D841:1 4841 31 71 &453BE1583483B17813157354,11 1 B7C1V1%1W1*43)(7D8171V1$1W01B1(4417813B541CB541'31V11W11-13171 &875441B)B31B1B143)(7D8171V1!1W1(D17D81BE)783C13C&E4C4B371944&131 C31B1484131BE71B15745371(4'441V1!1W1B11V1%1W11A71314841'B1BE71B1

(10)

5745371(4'441V11W1B1V151W131C7E45DE41I131'7DE1(41B(E417131B141 (41CB5131B15$5E351D(8D5D8411 11 A3541C1317E$1B1&73(E41B81&73141&8754D8413184&4B41*4183BEF014B513C41B93)1 B134841&B38171B7C171BE84B$1CB54131B1&84%37D183BE11 31 1CFD514615756CD163783D5E141A5137865F57ACD1FCA8371

1

24157C&E463$1781414B8517141(41CB51(4'44141'7157C&7D1'3141D4171 ,C1 31 31 /;!=1 '4841 1 31 41 C4B1 DC(481 71 B7C1 31 41 C7E45DE411 1 781 41 E75BE1 3C3EB83$157C&DB3713157C&E463$1CB$1(4184D54131/;1=1'41&84C57C&D3)1BEE141 &73(E41 CB53)1 781 E7'1 %BED41 71 E1 '351 31 84?D41 31 54C38$11 241 (41 CB53)1 (4'441B7C187C1'71C7E45DE411B1013141D417157C&D4141)E7(BE135ED371 71131011.;0=11231317(B341($141DC171BEE1,C1CB83614E4C4184EB3)171 CB541B7C1711B1013%341($141DC(48171B7C131112313%3371($1413641 71 1 (83)1 B337BE1 B$CC48$1 71 31 C4BD8411 2D1 41 7%48BEE1 57C&E463$1 71 31 B&&87B513131/;>'('8(=11

3. Experimental material and tests methodology

1

-1 31 &B8B)8B&1 '41 &8441 41 CB483BE1 B1 C477E7)$1 781 41 57C&B8371 71 41 5B&B53$1 71 4B51 3C3EB83$1 C4BD841 71 84D81 ($1 5EB335B371 71 3C&E41 C7E45DEB81 B1 8D5D84154C35BE1BC3E3414341($146&4811

F1231DC(481'B14B(E34146&483C4BEE$:13441781BEE1BB14141(41CB53)1

(11)

3.1 Chemical libraries 3.1.1 Data sets

1

57D81 &D(E351 54C35BE1 BB41 &D(E341 ($1 AD48EB1 97A 3D11 *++01B%41 (441 D411 2441 BD781)B484157C&7D1'351B1(441441B)B317D8134841(37E7)35BE1B8)4,1 441 BB41 &8441 41 B%BB)41 71 (43)1 BE84B$1 3%341 371 '4EE1 4341 B1 &84534E$14583(4154C35BE1BC3E341124153(1E3(8B8$157B31B141717.1C7E45DE41441 B133(378171415$5E776$)4B4C1B13%341371F1BC3E34,141CK61E3(8B8$131B14171 +"1 E3)B1 781 41 (4673B64&341 8454&781 3%341 371 F1 BC3E34,1 4A 8561 E3(8B8$1 57B31B14171."7133(378171413$877EB4184D5B413%341371F/1BC3E341'31 1 57C&7D1 (4E7)3)1 71 3)E471 BC3E3411 -1 31 D$1 441 3)E471 '4841 35B841B17E$1.157C&7D13%341371F.1BC3E341'48415734841124A61E3(8B8$1 31 B1 41 71 F++>1 487)41 8454&781 E3)B:1 >1 *468B541 87C1 41 E348BD8401 B841 )B4841 37118D5D8BE1BC3E341B17F7178C1B1C354EEB47D1)87D&:17817D81D$1'415734841 7E$1411BC3E341468B54187C141E348BD841A

7E45DEB81 (3C3C437BE1 *B01 8D5D841 '4841 &87%341 31 AB51 3E411 5781 BB836B371 &D8&741BEE141C7E45DE41'484178CBE3641B55783)171B1417178CBE36B3718DE41B1 'B1 (D3E1 31 B5578B541 '31 DDBE1 54C35BE1 DB)411 231 B91 'B1 B534%41 D3)1 41 ;4C671 7'B841 &&E35B371 @87)8BCCB351 -48B541 *'''154CB67157C01 B1 BB836B3718DE41'484178CBEE$14341B154C35BE184B537131B1Y1573)D8B371 3E4184B1($141;4C671ABB8364817(345112414D&13E4131B%B3EB(E41D&7184?D41A 1

-1 78481 71 57C&B841 C471 '331 41 BC41 4583&371 57461 34&44E$1 71 41 D3413B541'4184D5414141714583&781B753B41'314B51C7E45DEB81)8B&171 41 C33CDC1 41 '351 46341 31 BEE1 41 57C&B841 C471 781 '351 'B1 4B3E$1 34)8B41314B513B5413C&E4C4B3711-131C33CBE184&844B371B1C7E45DE4131B1 B83(D41D384541)8B&13N*1011!B51%484616131184&8441B1B7C1B131EB(4E41 ($1 41 B7C1 $&41 *;1 H1 <1 451011 !B51 4)41 ;6?=1 31 1 84&8441 B1 54C35BE1 (7D,1 31 31 5B8B5483641($1B1$&41B15B1(418ED91B54D91768)D91781365237811

3.3 Kernel functions implementation and clustering algorithm

1

B A97A3DA*++"01&87%341413C&E4C4B3717141382575A>969D1157814B517141'71 74819484E1*(84L8A>969D1B1/A>969D01438184&453%41BD78I13C&E4C4B371'B1 D411H5413B541CB83541B1(44157C&D41C7E45DE41'48415B4)783641371BC3E341 D3)1 41 '4EEC97'1 B54B1 348B8535BE1 5EB335B371 *U771 F>7.0,1 41 5741 3C&E4C4B371'B1D796187C1#17'B841*'''18C&87345178)K01B1412B813461'B1 413485EB1B))84)B3713B5411 3.4 Parameters setting 1 -1417EE7'3)146&483C4B3717814B5171414E4541C47141&B8BC448=1%BED41 '48417&3C3641781(415EB335B3715B&B5334111 1 578141382575A>969D1BEE1%BED41*87C1"171+01'484144178141&B8BC44811*BEE1&B1 71E4)187C1F1D3E101B1D1*BEE1&B17146B51E4)1D01157814B51BB(B4141&B8BC4481

(12)

*434811781D01'B1D5B)41B141B753B41%BED41B1)B%4141(4184DE1*4B3E15B1 (41)3%41D&7184?D4011For the 2D-WD, OA Kernels and Ipi, the main parameter is the width of the

neighbourhood, which is taken into account to evaluate the similarity between two atoms. In Ipi, this

parameter corresponds to the number of iterations used to compute the matrix of similarities between all pairs of atoms. A value of 5 was sufficient in Ipi to see the convergence of the SUB matrix. Thus,

this value was selected for the corresponding parameters in OA Kernel and 2D-WD Kernel.

1

3.5 Classification evaluation

1

1 4C&B3641 ($1 ;B4EE3481 97A 3D'1 *++701 5EB335B371 4%BEDB371 31 335DE1 '37D1 B$1 %BE3B371583483B11-13171415B4148413541'4197'17814B51BB4141DC(481B1 &845341 5741 *31 48C1 71 C7E45DE401 71 41 BC3E341 *781 5EB401 B1 41 $4C1 CD1 84834%4112714%BEDB41413484541(4'44141783)3BE15EB335B371B141E4B81741B1 57D371 CB8361 'B1 D41 *G7B%31 B1 @87%71 F>>/01 5BEE41 ;/C=11 -1 E341 */81 '31 81

%B8$3)187C1F171)0184&844141783)3BE15EB41B13157EDC1*C<1'31<1%B8$3)187C1F1

71 *01 84&8441 41 5EB41 (D3E1 ($1 41 5EB335B371 $4C11 !B51 CB8361 4E4C41 8<1

84&8441 41 DC(481 71 C7E45DE41 '351 B841 &8441 31 (71 5EB41 /81 B1 C<11 241 (D3E1

5EB335B371317&3CBE1'414841317E$1741%BED418<13484187C1+17814B51E341B1

4B51 57EDC11 A71 B1 3C&E41 'B$1 71 ?DBE3$1 41 5EB335B371 31 71 C4BD841 41 B%48B)41 487&341B753B4171E341B157EDC112'7133541(B417157337BE1487&341'4841 5734841 41 5585A 1B931 ;1=A B1 ?DB3341 41 DC(481 71 C48)41 5EB41 B1 41 A9E2973785A1B93A;1=1B1?DB334141DC(48171&E315EB411 3%411 , , , 1 1 1 1 q p p q i i j j i j i j j i i j C n L n N n = = = = =

1

=

1

=

11

1 *701 1 1'41434:1 CI= Ci N i=1 p

1

ni, j Ci j=1 q

1

×log2 ni, j Ci 1 *.01 SI= Lj N j=1 q

1

ni, j Lj i=1 p

1

×log2 ni, j Lj 1 1 1

241 C71 3C&78B1 3461 31 11 3541 31 335B41 31 41 333BE1 5EB335B371 B1 (441 '4EE1 84834%41($141BE)783C11241E7'48131%BED413141(44813141CB53)1(4'441(71 5EB335B371

4. Comparative analysis of similarity measures

1

53)D841 "1 7'1 41 11 3461 4%7ED371 781 41 7D81 54C35BE1 E3(8B83411 !654&41 781 53(1 E3(8B8$18B93)171417D81C47131BE'B$141BC4:11)81382575A>969D1/A>969D1B11

(84L8A>969D11;73483)14184DE17814B51(B41315B1(417(48%41B1C7E45DE4187C1 4153(1E3(8B8$1(4E7)1715E7415B7E1*8'9'157841)8B&18D5D8401147'4%481BC3E341B841 4B3E$1 8457)36B(E41 (45BD41 71 4'1 3583C3B3)1 B7C1 781 54C35BE1 D5371 '741 &73371 %B8$1 31 B87CB351 5$5E411 1)81 3461 8457)3641 4B8E$1 BEE1 41 46&4541 BC3E341

(45BD41 31 31 B(E41 71 4451 4B51 B7C1 E75BE1 4%387C41 D3)1 31 E75BE1 3C3EB83$1 C4BD84C411 1)8A 5B4)78341 B1 71 71 57C&E$1 '31 46&48341 54C35BE1 BC3E341

(13)

57884&71 71 %48$1 CBEE1 BC3E341 '351 B%41 (441 C48)41 ($1 41 5EB335B371 $4C11 382575A>969D1BE71&87D541B1)77157841'484B1(84L81G484E1B1/A>969D184DE1 '4841E414353413184D83)154C35BE1BC3E3411

1

7E45DE41 87C1 41 8561 E3(8B8$1 B%41 4%48BE1 3C3EB81 D(8D5D841 '351 B841 3484E$1 5745417)448187C1741BC3E$171B74811#4DE1B84141BC41B17814153(1BB(B41 (D1'31B1)84B4813&48371(4'44141C47111)8134617'1315B&B53$171)E7(BEE$1

8457)3641C7E45DEB818D5D84111 1

A8D5D841 71 C7E45DE41 87C1 41 CK61 E3(8B8$1 B841 %48$1 3%4841 B1 31 %B83B(3E3$1 31 7C43C41 8B481 )84B1 '331 41 783)3BE1 BC3E3411 -1 31 5B41 BEE1 41 C471 B3E41 71 B55D8B4E$18457%48141783)3BE15EB335B371)3%41($154C311

Cox2 library Dhfr library

Bzr library Er library

Fig. 5. Comparison of similiarity measures. CI index for the four distances used with HAC clustering method on the four datasets: Cox2, Dhfr, Bzr and Er. The vertical line marks the original number of chemical families and the point where a ranking between the four methods can be done.

1

53BEE$1 41 61 E3(8B8$1 57B31 7E$1 8441 BC3E3411 !B51 741 31 5B8B5483641 ($1 B1 &453351 5B7E1'3517DE1(414B3E$18457)36411-4411)81B1382575A>969DA7D1B)B3141

8441 46&4541 BC3E341 *4381;-1 5D8%41 B841 D&483C&74011 H1 41 7481 B1 B)B31 (84 L81>969D1B1/A>969D1&4878CB541'B1E7'4811

1

241441B&&87B54157884&7171'71348418B4)3411382575>1B1(84L8A>969D1 84&844141C7E45DE41($1C4B171&B13141)8B&1578B8$171/A>969D1B11)81B1

(14)

'$17814B51E3(8B8$1411)81B1382575A>969D1C471&4878C41(44811-1415B4171/A

>969D1B11)81'351D415E741BE)783C1'414B854141CB31C7335B371B1'7DE1

46&EB31 41 34845411 271 31 &D8&741 1)81 BE)783C1 'B1 5B)41 71 48B41 31 '71 CB3781

3484541'31/A>969D:11

• 241B$CC4835BE15BE5DEB37171413C3EB83$:1314?DB371*"01.428HA;38=A'B184&EB541

($A33A;.428HA;38=A.428HA;30)==ABA3141)E7(BE13C3EB83$15BE5DEB371*441110141

3641 71 A 'B1 84&EB541 ($1 41 CB63CDC1 %BED41 (4'441 41 364A 7A 1 B1 01A 241 C4BD841(45BC41D1&D84E$1$CC4835BEC1

• 241 3584C4BE1 CB53)1 B93)1 B557D1 71 41 C7E45DEB81 57453%3$:1 '41 84&EB541 7D81CB53)1BE)783C1*4411101($1B1GD1BE)783C1E7793)1781B1CB63CDC1'43)1 CB53)131B1(3&B8341)8B&1(D3E1'31B7C1711B1011

1

2413ED454171441C7335B371B1438157C(3B371'B1D34171415EB335B371 71 41 7D81 54C35BE1 BB(B411 241 84DE1 71 71 4&41 71 41 BB(B4,1 B93)1 371 B557D141C7E45DE4157453%3$1(83)141CB313C&87%4C41'3E41B$CC48$1E3)E$1 3C&87%41417%48BEE15EB335B371153)D841714B3E144184DE171856ABB(B4111

1

31  11134641781"13B541D41'314;15ED483)1C47171418561BB41/A >969D11)81B113B5417(B341($1C7335B371711)8:1A$CC483515BE5DEB371713C3EB83$1

3461*5D8%418-&31L1A$C901781GD1BE)783C184&EB53)13584C4BE1CB53)1BE)783C1*8-&31

L1 GD901 781 (71 C7335B371 *8-&31 L1 GD1 L1 A$C9011 #4&EB53)1 GDI1BE)783C1 ($1 B1

3584C4BE1741B93)1B557D14157453%3$1)3%41B187)13C&87%4C41(D1'B4%481 41 CB53)1 BE)783C1 31 41 B$CC48351 BD841 71 41 3461 E3)E$1 3C&87%41 41 435345$171415EB335B3711

-1415B4171382575A>969DAB1(84L8A>969D1313141&B14E45371'351313C&78B1 71)41B1B55D8B4184&844B37171C7E45DE411

1

231D$1B1(44175D41714157C&B8371714%48BE1)8B&19484E1B&&E3417154C35BE1 57C&7D1 31 B1 D&48%341 5EB335B371 B9,1 B1 31 71 B$1 41 BC3E341 71 C7E45DE41 B841 97'11 -1 31 5B41 741 84CB33)1 ?D4371 31 41 573541 71 41 7&3CBE1 %BED41 781 41 ;7D371-4613178481714341415ED483)1E4%4E11-1&B835DEB813E1CB$1(41335DE171 575ED41 31 B84B1 '4841 ;-1 31 ?D341 57B1 *41)11 781 8H83)3BE1 -&391 B1 8-&31 L1 A$C91

(15)

C471(4'441F"1B1+15ED4801147'4%48131B184BE15B41415EB335B371317197'1 B141;7D371-46157DE171(4157C&D411

5. Conclusion

1

241 4%BEDB371 71 C7E45DEB81 E3(8B8341 B1 C7841 &45335BEE$1 C7E45DE41 5B4)7836B371 371 BC3E341 31 3C&78B1 781 (37E7)31 B1 54C31 (47841 B1 B481 8A 8D851 78A 8A 687651 C7E45DEB81 58443)11 -1 31 5B&481 '41 B%41 4583(41 7C41 71 41 3C&78B1 3C3EB83$1 C4BD841 5D884E$1 D41 B1 B1 4'1 3C3EB83$1 3461 '41 8454E$1 4%4E7&41 781 54C35BE1 C7E45DE41 57C&B83711 -1 31 %48$1 335DE1 71 57741 B1 3C3EB83$1 C4BD841 781 B1 54C7378CB351 &D8&741 (4341 4C&3835BE1 57348B371 E3941 41 B%B3EB(3E3$1 31 41 7'B841D341D413141EB(78B78$141BC3E3B81D3E36B37171B1)3%413C3EB83$1C4BD84131 B4BC1781414C78B41435345$131B146&483C4BE157461E4B3)171414E4537171 41 D41 3461 781 D(4?D41 BBE$411 44841'41 57C&B841 C471 71 7D81'4EEC97'1 54C35BE1BB413178481714%BEDB41415B&B53$17141BE)783C17184834%4141BC3E341 4341($154C311

1

241 84EB3%4E$1 3B&&733)1 84DE1 7(B341 '31 (84L8A >969D1 B1 />969D1 44C1 71 335B41 B1 83B5491 B%3)1 )771 &4878CB541 31 B1 D&48%341 E4B83)1 57461 *B53%3$1&84353701B84171BE'B$1BB&41715EB35BE15ED483)1BE)783C11$157C&B83)1 1)81 B1 /A >969D1 '41 7(48%41 B1 B93)1 371 B557D1 41 C7E45DEB81 57453%3$1 'B1

3C&78B1 (D1 BE71 B1 41 3B541 C4BD841 (B41 71 B$CC4835BE1 57C&B8371 57DE1 E4B171(448184DE1B141741(B4171B1&EB31$CC4835143337111

1

27157C&E44131D$1317DE1(4134843)17134)8B41AE15ED483)1*BC7)17481 4C4D81 97A 3D'1 ++F,1 53E4$1 B1 U7B53C1 ++"01 B1 AE1 5EB335B371 *#D&&1 97A 3D'1 ++.01 34B171414348B8535BE154B1;EB335B371781($143)1411G484E1464371 71 B 1 97A 3D'1 *++011 -1 41 5B41 71 41 B$CC4835BE1 C4BD841 '41 387D541 4841 B1 57C&B841715EB35BE13464131313C&78B1 71D84813%43)B414184414&1711)8171

D48B1 5E4B8E$1 '351 74*01 31 *B8401 41 C71 3C&78B1 781 41 1)81 435345$1 31

57C&B8371 '31 41 382575A >969DA 781 41 (84L8A >969D11 -441 31 31 7%48%34'1 741 7DE1(41D8&8341($141)77184DE17141382575A>969D1'351315E4B8E$1E4157C&E461 7157C&D41B1411)813461147'4%48131EB481&8441'71B%BB)41*34&44E$1

71(43)18B941381781BEE141441E3(8B8340:17141741B131B941371B557D1BEE141 97'E4)41 B(7D1 41 C7E45DE41 '37D1 443)1 B1 E34B836B371 71 31 31 71 454B8$1 71 CBDBEE$1 57741 41 3641 71 41 8D5D8BE1 94$1 71 D41 B1 4841 31 71 E71 71 8D5D8BE1 378CB37,171417481B1($1C73$3)13ABADAD53713131&73(E417134)8B4131

41C4BD841BEE141&$35BE1B154C35BE1378CB371B14146&481'7DE13D)41D4DE11

1

6. Acknowledgments

1

241 BD781 B91 @11 B 1 781 8D3DE1 35D371 B1 41 D41 71 41 382575A >969DA1 3C&E4C4B371&B817131;@@17'B841124$1'7DE1E394171B597'E4)4141;4C671 ;7C&B$1*&:KK'''154CB67157C01781BEE7'3)1B5B4C35171844E$1D4143817'B841 4&453BEE$1 4841 41 73B36B8K961 71 78CBE3641 C7E45DE411 24$1 B841 BE71 )8B4DE1 71 U11 E4D41 B1 G11 B4E(7C;7834E1 781 41 84B3)1 71 31 CBD583&1 B1 4381 D))4371 781 4381 578845371'3513C&87%4131A&B8E:131'7891'B1&B8E$1D&&7841($1)8B187C141

(16)

58451338$171#44B8518;-1-C&(3791&87)8BC1*55BC(B1&8734501B187C14158451 8#Z4CE&415DD89157DB37111 55D87A55A179697:174145EB8411

1

7. References 1 4C4D811197A3D11*++F01AD&&781E45781;ED483)117563DA55A389A29368EA993611 &&11F"[F.1 37111*F>>"012$1B17'1714341B13C3EB83$1C4BD8417817(345C(B4184&844BC 371 $4C11 -6599B8EA 55A (BA 879637853DA 559699A 5A 48DB8EA 3BA 368EA 696 A D36E943D9AH5?D9BE9A439A;>C>='1-HA1&8411!5441*<01&&117[71

;B4EE3481 11 97A 3D11 *++701 ;B5B41 !%BEDB371 71 ;ED483)1 E)783C11 -6599B8EA 55A 2148E31&&11".["/F11

B7'1111B1B8B8U111*++01;ED483)1471B124381D4131;7C&DB37BE1 ;4C38$119E'A52)7'A92'1!"1&&11F[+11

BD(731 U11 97A 3D11 *++/01 ;7EE45371 71 ;7C&7D1 C1 47'1 71 B4BE1 '31 4C1 \A 6697A 52)79648B9BA86EA898E11*01&&11F"7[F7/1

!)4E1 15111 97A 3D11 *+++01 ;4848DA:1 1 A$4C1 AD&&783)1 41 A4?D43BE1 A58443)1 @8754117'A92'A15'A52)7'A8'11#1&&11F["11

53E4$1 211B1 U7B53C211 *++"01 AD&48%341 ;ED483)1 '31 AD&&781 E45781 B5341 31

-6599B8EAA55A12'AC5148CB$11&&11F.1[111

5761 A11 97A 3D11 *++701 43)C87D)&D1 58443):1 D&B41 71 &8B53541 B1 D55411 7A C8525DA 69911!!1&&11/7[/7>11 5E7'481B1#11*F>>/01H141&87&4834171(3183)C(B41C4BD8417154C35BE13C3EB83$117'A 92'A15'A52)7'A81$"1&&11./[/711 58FE35141197A3D11*++"011H&3CBE13)C41G484E178183(D417E45DEB818B&11A -65'A55A17'A55'A5A389A29368EA;12=1&&11"[11 B848121197A3D11*++01H1)8B&19484E:14B84184DE1B1435341BE48B3%411-65'A55A 79A 197A 3DA 559699A 5A 52)737853DA 29368EA 956 A 3BA 79A 7A 3DA L56H5)A5A>969DA389'AM98B9D496E1:1A&83)48CE48EB)11

-)E441 U11 97A 3D11 *++.01 43)C87D)&D1 58443)1 BB$1 781 41 34335B371 71 54C35BE1 &87(411.37A92AC85D11$1&&1177[.>11 U771A11*F>7.014348B8535BE1;ED483)1A54C41- 529768H311&&11F["11 G7B%31#11B1@87%7511*F>>/01E7B8$17148C11389A29368E1$#1&&11.F[.1A GD1 41211 *F>""01 241 4D)B83B1 471 781 41 B3)C41 &87(E4C1 <B%BE1 #44B851 7)3351]DB848E$111&&11/[>.11 4B511#11B13EE4E1U11*++011-87D537171;4C7378CB351GED'4815B4C351 @D(E34811 D9841U11*F>".01E)783C1781413)C41B128B&78B371@87(E4C1A7563DA55A 79A5897 A55A1B7683DA3BA))D89BA37923781%1*F01&&11[/111

B41 @11 97A 3D11 *++01 !64371 71 B8)3BE3641 8B&1 G484EA 1A -65'A 55A 79A (17A 179637853DA559699A5A389A29368EA;12=11B1E(48B111

B41 @11 97A 3D11 *++"01 8B&1 9484E1 781 C7E45DEB81 8D5D84CB53%3$1 84EB373&1 '31 D&&781%45781CB534117'A92'A15'A5B9D'1%*01&&11>>[>"F11

B8 5BE1 !11 *++/011 ;4C7)47C35:1 B1 353&E341 B1 41 58787B1 71 3)1 87D)&D1 457E7)341 (37CB89481 844B851 57C(3B783BE1 54C38$1 )47C351

(17)

54C378CB351 (37378CB351 B1 B83353BE1 34EE3)45411 524A 92A M8EA 65E)7A699'1!!1&&11"/["/711

B831A1;1197A3D11*++01B71A8D5D8BEE$1A3C3EB817E45DE414B%41A3C3EB8137E7)35BE153%3$1 \17'A9B'A92'11%1&&11"+["/11

45431 A11 97A 3D11 *++"01 243)41 457C&73371 9484E1 -6599B8EA 55A 79A 179637853DA 559699A5A389A29368EA;12='17148CB$1&&1"/"1[1">11

B$48121D11*++01;4C35BE1)4435:1B3E783)177E178154EE1(37E7)$1169BA9DDAC85D11!$1&&11 .+[..11

@B89481 ;1<11 B1 A584$481 A1G11 *++01 &&E35B371 71 ;4C7378CB351 71 43)C 287D)&D1 A58443)1 -:1 975BA 8A 5D9D36A C85D5E 1 65DA ($NA 92585562378NA 59)7A975BA3BA55DA556A86EA885696 'A!341($1U1B378B114DCBB1@841 -51127'7B1<U1&&1/"[FF+11

#BEB3%7EB1 11 97A 3D11 *++"01 8B&1 9484E1 781 54C35BE1 378CB3511 .963DA .97?56H1 !"1 &&11 F+>[FFF+11

#D&&111,1@875B91!1,1A54348111*++.01G484E1&&87B51717E45DEB81A3C3EB83$1B41 71-48B3%418B&1A3C3EB83$A7'A92'A15'A5B9D'1.1&&11/+C/71

A5FE97&111B1AC7EB11U11*++014B83)1'31G484E1241-21&841;BC(83)411 AB'4C2B$E781 U11 B1 ;833B331 <11 *++01 G484E1 471 781 @B481 BE$31

;BC(83)41D3%483$1@8411

A431#111*++0143)C87D)&D158443)131B5B4C3B:1414B8%B8146&4834541171C8525DA 69911"1&&117F"[7F>11

A759'4EE11#11*+++0158734813154C35BE1)44351169BAC85795D11!"1&&11>[""11 AD48EB1 U1U11 97A 3D11 *++01 A&E34C533)1 '31 B1 44351 E)783C:1 1 471 781

B4%4E7&3)1;EB335B371A8D5D84C53%3$1#4EB373&17'A92'A15'A52)7'A8'1 $1&&11F>+7[F>F"11

2433)481 B11 *F>//01 A-!A1 B1 ;4C35BE1 B)DB)41 B1 -78CB371 A$4C11 F11 -87D537171477E7)$1B1!573)1#DE417'A92'A15'A52)7'A8'1"1&&11 F[711

2345678491 A11 97A 3D11 *++701 D33)1 41 A4B851 31 41 <H1 #4)371 71 41 @B41 28B3371 @87(E4C1 '31 B1 @B83BE1 AD(DC&371 2411 1A )6599B8EA 55A 2A (##9'A 2.A %(1(O(##911F/[14&4C(48148E311&&1/F.[/11

Figure

Fig.  3.  Computation  and  evolution  of  the  matrix  SUB.  Example  with  the  molecule  M  being  formamide (CONH 3 ) and the molecule M’ being glycol-aldehyde (C 2 H 4 O 2 )
Fig.  5.  Comparison  of  similiarity  measures.  CI  index  for  the  four  distances  used  with  HAC  clustering method on the four datasets: Cox2, Dhfr, Bzr and Er

Références

Documents relatifs

Assuming randomly drawn training pairs, we prove tight upper and lower bounds on the achievable misclassification error on unseen pairs, the bridge between the two settings

In Section 3, we give a new similarity measure that characterize the pairs of attributes which can increase the size of the concept lattice after generalizing.. Section 4 exposes

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The main goal of these experiments was to assess the effectiveness of our proposed approach in comparison with: (1) the random choice of the similarity mea- sure, (2) the ground

The proposed similarity measure is composed of three sub-measures: (i) the first one exploits the taxonomic struc- ture of ontological concepts used to instantiate resources, (ii)

Variation of the global F-score values when (A) varying the number of gen- erated fuzzy clusters K with the fuzzy C-Means algorithm using IntelliGO similarity measure and (B)

[r]

With respect to our 2013 partici- pation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya