Development and validation of target-based drug design tools: virtual screening of monoamine oxidase inhibitors

(1)

Thesis

Reference

Development and validation of target-based drug design tools: virtual screening of monoamine oxidase inhibitors

BRAVO, Juan

Abstract

Diverses techniques permettant d'améliorer les prédictions d'un outil d'arrimage moléculaire - GOLD - ont été développées et validées par comparaison avec des résultats expérimentaux.

Premièrement, le "filtre MLP" permet une description fiable de l'hydrophobie dans les cavités protéiques. L'implémentation de cet outil dans GOLD a permis d'augmenter le taux de réussite de 5% lors du ré-arrimage de 85 complexes ligand-protéine de l'"Astex Diverse Set".

Deuxièmement, une méthode de criblage virtuel basée sur la sélection de molécules d'eau ordonnées, d'utilisation du filtre MLP et de pré-optimisation de la poche a permis de classifier correctement "in silico" l'activité de 100 coumarines, par rapport à leur puissance inhibitrice de la monoamine oxydase humaine "in vitro". De très bonnes corrélations entre résultats expérimentaux et prédictions ont été trouvées. Finalement, diverses familles de composés ont été investiguées expérimentalement et ont fourni de nouveaux châssis moléculaires pour l'inhibition de la MAO.

BRAVO, Juan. Development and validation of target-based drug design tools: virtual screening of monoamine oxidase inhibitors. Thèse de doctorat : Univ. Genève, 2009, no.

Sc. 4082

URN : urn:nbn:ch:unige-26265

DOI : 10.13097/archive-ouverte/unige:2626

Available at:

http://archive-ouverte.unige.ch/unige:2626

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Development and Validation of Target-Based Drug Design Tools: Virtual Screening of

Monoamine Oxidase Inhibitors

THÈSE

présentée à la Faculté des sciences de l'Université de Genève

pour obtenir le grade de Docteur ès sciences, mention sciences pharmaceutiques

par

Juan Bravo de Boécourt (JU)

Thèse N° 4082

Genève

Atelier d’impression ReproMail 2009

(3)

(4)

Cette thèse de doctorat a été effectuée d’octobre 2004 à février 2009 au sein du Groupe de Pharmacochimie de l’Université de Genève, sous la direction du Professeur Pierre-Alain Carrupt. J’aimerais ici exprimer ma reconnaissance envers toutes les personnes qui ont contribué de près ou de loin à la réalisation de ce travail.

Tout d’abord, ma plus sincère gratitude va au Professeur Pierre-Alain Carrupt, qui m’a permis initialement d’effectuer mon travail de diplôme à l’ICT de Lausanne, avant de m’accueillir dans son équipe de recherche avec un sujet de thèse stimulant. Ses conseils scientifiques de grande qualité et son expérience ont été très enrichissants. Je le remercie également pour les excellentes conditions de travail qu’il m’a offertes ainsi que pour la confiance qu’il m’a accordée tout au long de cette thèse.

Un immense merci également au Dr. Marianne Reist-Oechslin pour son soutien constant et son encadrement scientifique de premier ordre, même après son départ du Groupe de Pharmacochimie. Son amitié et son suivi ont été inestimables.

J’aimerais aussi remercier particulièrement mes deux prédécesseurs, les Drs. Laura Novaroli-Zanolari et Antoine Daina qui ont établi les bases de mon travail de recherche et m’ont encadré avec une grande gentillesse et compétence pendant mes premières années de thèse. Le Dr. Antoine Daina a enfin été d’une grande aide lors de la rédaction de ce manuscrit et de la préparation de la soutenance de thèse. Merci encore à lui.

(5)

remarques pertinentes et leurs conseils avisés.

Je remercie aussi mes collègues du Groupe de Pharmacochimie et de l’UNIGE. En particulier, merci à Bénédicte, Fabrice et Gaëlle pour leur grande aide au laboratoire ; Sylvia pour sa disponibilité, sa patience et son travail impeccable ; les gens du 406, Julien et Elisabeth pour les coups de main informatiques et les bons moments. Sans oublier Amandine, Alex, Céline, Frédéric, Delphine, Liliana, Philippe, Sophie, Karine et Yveline, qui ont contribué par leur amitié à faire de ces années à Genève une agréable expérience.

Merci enfin à ma famille, mes amis et à Nathalie; avant tout, ma reconnaissance et mon affection vont à mes parents pour leur soutien, leurs encouragements et leur tendresse depuis toujours. ¡ Un abrazo muy fuerte !

(6)

A mes parents

(7)

(8)

Abbreviations and symbols

[I ] Inhibitor concentration 4OH-Q 4-hydroxyquinoline

5-HT 5-Hydroxytryptamine (serotonine)

 pIC50 pIC50 MAO B - pIC50 MAO A

G Free energy variation

 Enthalpy variation

S Entropy variation

Em Emission wavelenght

Ex Excitation wavelenght

A Bêta-amyloid protein

AChE Acetylcholinesterase

AD Alzheimer's Disease

ADMET Absorption, Distribution, Metabolism, Excretion and Toxicity ADP Adenosine di-phosphate

ADS Astex Diverse Set

BFGS Broyden-Fletcher-Goldfarb-Shanno method C/ATS CCDC/ASTEX Test Set

CADD Computer-Aided Drug Discovery

CCDC Cambridge Crystallographic Data Centre CNS Central Nervous System

CS ChemScore DA Dopamine DMSO Dimethylsulfoxyde

FAD Flavine-Adenine Dinucleotide FEP Free Energy Perturbation

FN False negatives

FP False positives

GA Genetic Algorithm

GF/MF Docking failures with GOLD-standard and MLP-filter methods

(13)

GF/MF+ Docking failures with GOLD-standard and MLP-filter methods;

minor improvement with MLP-filter method (MLPb  0.2)

GF/MS Docking failures with GOLD-standard, success with MLP-filter method GOLD Genetic Optimization of Ligand Docking

GS/MF Docking success with GOLD-standard, but failure with MLP-filter method GS/MS Docking success with GOLD-standard and MLP-filter methods

GS GoldScore h Hill coefficient

HFP Hydrophobic Fitting Points

HTS High-Throughput Screening HuMAO Human Monoamine Oxidase

IC50 Half maximal inhibitory concentration Km Michaelis-Menten constant

LogP Logarithm of the partition coefficient between organic and aqueous phases

MAO Monoamine Oxidase

MC Monte Carlo

MD Molecular Dynamics

MIF Molecular Interaction Field MLP Molecular Lipophilicity Potential MLPb MLP-filter benefit

MLPHo Positive MLP lipophilicity values (predominance of hydrophobic interactions) MLPPol Negative MLP lipophilicity values (predominance of polar interactions)

MLPt MLP threshold

MM Molecular Mechanics

MM/GBSA Molecular Mechanics/Generalized Born Surface Area MM/PBSA Molecular Mechanics/Poisson-Bolzmann Surface Area NMDA N-methyl-D-aspartate

PAS Peripheral anionic binding site

PD Parkinson's Disease

PDB Protein Data Bank pIC50 -log IC50

Pk Pharmacokinetic

(14)

QM Quantum Mechanics

QSAR Quantitative Structure-Activity Relationships QSPkR Quantitative Structure-Pharmacokinetic Relationships QSPR Quantitative Structure-Permeation Relationships QSSR Quantitative Structure-Solubility Relationships

RL Reference ligand bound into the binding site during minimization RMSD Root Mean Square Deviation

rs Spearman's ranking coefficient

S Single minimization of the binding site, without bound ligand TBVS Target-Based Virtual Screening

TI Thermodynamic Integration

v0 Reaction velocities in absence of an inhibitor vi Reaction velocities in presence of an inhibitor wat Water

Note: the series of compounds presented in this work are numbered for the whole thesis, and not by chapter.

(15)

(16)

Chapter 1: Résumé de la thèse

1.1 Introduction

Depuis leur introduction dans les années 80, les méthodes computationnelles sont devenues un élément clé dans la recherche pharmaceutique, tant à l’échelon académique qu’industriel. Leur domaine d'application s’est étendu, et une grande variété de logiciels sont à présent disponibles en vue de regrouper, analyser et intégrer des données biologiques et médicales; plus spécifiquement, ces logiciels permettent de créer des modèles informatiques visant à faire des prédictions, proposer des hypothèses, et par là même favoriser l'innovation et les progrès en médecine et thérapeutique [1-4].

Parmi ces nombreuses applications, les techniques d’ancrage (docking) et de criblage virtuel (virtual screening) cherchent à prédire et modéliser les interactions entre une molécule à potentiel thérapeutique et sa cible pharmacologique, le plus souvent une protéine (enzyme, récepteur ou autre). Des informations structurales de qualité sont essentielles - généralement, des coordonnées 3D pour chaque atome de la protéine- cible -, ainsi que la connaissance des éléments clés qui modulent l’activité de la protéine.

De cette façon, il est possible d’établir des modèles permettant non seulement de prédire la manière avec laquelle le ligand se lie à sa cible, mais aussi d’estimer l’affinité entre les deux entités. Le but final est de pouvoir sélectionner virtuellement des molécules actives, avant de vérifier la pertinence des prédictions par des tests in vitro ciblés. De telles approches sont donc complémentaires au criblage biologique, et permettent de réduire la part de travail dévolue aux méthodes conventionnelles in vitro. Ceci permet un gain de temps et d’argent essentiels dans le long et coûteux processus du développement de candidats médicaments [3,5].

(17)

Figure 1.1 : Illustration des principales étapes d’un processus de criblage virtuel. Les ligands sont ancrés dans la cible un par un, puis leur affinité pour la cible est estimée (“scoring”). Enfin, les ligands sont classés selon leur affinité prédite, de façon à sélectionner les candidats les plus intéressants pour les tests biologiques.

Cependant, de nombreux défis techniques restent à relever avant d’obtenir des prédictions parfaitement dignes de confiance pour toutes sortes de ligands et de cibles. Parmi eux, la description appropriée de certaines interactions intermoléculaires importantes, la prise en compte de molécules d’eau explicites ainsi que de la plasticité des protéines cibles sont autant de challenges qui restent à surmonter.

1.2 Buts du travail

Dans le contexte de l’amélioration des outils d’ancrage et de criblage virtuel, l’optimisation et la validation de différentes procédures de modélisation moléculaire seront présentées. Une application de ces procédures sera menée dans le cadre de la recherche d’'inhibitieurs de la monoamine oxydase B, une enzyme importante dans le domaine des

Ancrage Ligands

Cible

Scores Classification 52.1

49.3 37.6

34.2 33.0

28.6

25.3

52.1

49.3 37.6 34.2 33.0 28.6

25.3

(18)

maladies neuro-dégénératives. En parallèle, plusieurs petites librairies de composés seront testées biologiquement afin de fournir des valeurs expérimentales d’inhibition sur lesquelles pourra se baser la validation des outils de modélisation.

1.3 Résultats

1.3.1 Validation du filtre MLP

Dans le contexte de l'ancrage, nous nous sommes intéressés au programme d’ancrage moléculaire GOLD [6], et plus particulièrement à l’influence de facteurs décrivant l’hydrophobie de la cible sur l’exactitude des résultats d’ancrage. GOLD utilise un algorithme génétique pour déterminer les poses les plus plausibles pour un certain ligand à l’intérieur d’une cible protéique. Pour ce faire, il est guidé par une fonction de score basée sur un champ de forces, ainsi que par des points de contact hydrophobes (hydrophobic fitting points, HFP) sur lesquels viennent se placer les atomes de carbone du ligand. Ces points sont générés en amont du processus d’ancrage, grâce à un potentiel de Lennard-Jones ne prenant en compte que des facteurs stériques. De ce fait, des points de contact hydrophobes sont générés dans tout le site actif, sans prendre en considération la polarité des chaînes latérales. Les atomes de carbone du ligand risquent donc d’être guidés vers des zones polaires défavorables pendant le processus d’ancrage. Grâce au potentiel de lipophilie moléculaire (MLP) [7], nous avons cherché à mieux caractériser les zones d’interaction hydrophobes à l’intérieur des sites actifs. Ce potentiel est basé sur les coefficients de partage dans un système n-octanol/eau, et a permis de démontrer qu’une partie significative des HFP définis par GOLD se trouvent effectivement dans des zones polaires, comme illustré dans la Figure 1.2. De plus, nous avons pu montrer que le biais relevé dans la définition de l’hydrophobie par GOLD entraîne une baisse des

(19)

performances de ce programme, principalement dans les poches présentant un site actif ambivalent du point de vue de la polarité.

Figure 1.2 : Points de contact hydrophobes dans le site actif de la panthoténate kinase d’E. coli (1sq5). La surface de Connolly de la cavité est montrée en en vert.

A) HPF générés par GOLD, colorés selon leur valeur de MLP; zones polaires:

rouge, magenta, orange, jaune (polarité décroissante); zones hydrophobes : bleu, bleu-vert, vert, cyan (hydrophobie décroissante). B) Points restants après la procédure de filtre MLP (valeur borne à 0). Les zones entourées par des résidus polaires (Arg, Asp, Thr, ADP) ne sont plus considérées comme hydrophobes.

Le filtre MLP a donc été proposé afin de pallier à ce problème. Cet outil soumet les HFP produits par GOLD au calcul du MLP, et élimine les points présentant une valeur de lipophilie inférieure à une limite définie empiriquement. Ceci permet de ne conserver que les points réellement hydrophobes dans la poche (selon la définition du MLP). L’implémentation du filtre MLP dans une procédure d’ancrage avec GOLD a été validée par des essais de ré-ancrage sur deux ensembles de complexes ligand-protéine, le CCDC/ASTEX Test Set (C/ATS. [8]) et l’ASTEX Diverse Set (ADS, [9]).

Les résultats ont montré que l’utilisation du filtre MLP peut générer des prédictions de poses plus proches de la pose observée par cristallographie, et donc permet une augmentation significative du taux de

Tyr1180

His1177

Thr1126

Asn1282

Tyr1240

Arg1243 ADP

Tyr1180

His1177

Thr1126

Asn1282

Tyr1240

Arg1243 ADP

A B

(20)

succès d’ancrage, principalement dans les cavités présentant une hydrophobie intermédiaire. Ce type de cavités étant le plus difficile à gérer pour l'algorithme original de GOLD, l'intérêt du filtre MLP a été confirmé (Figure 1.3).

Figure 1.3 : Solutions d’ancrage pour le complexe 1tow. Le ligand co-cristallisé est montré avec les atomes de carbone colorés en vert, et les poses générées par GOLD en jaune. A) Résultats sans filtre MLP : l’algorithme est incapable de retrouver la pose correcte, car des HFP situés en bas de la poche attirent les solutions d’ancrage dans une zone polaire. B) Après filtration par le MLP, les zones hydrophobes sont mieux caractérisées et une majorité de solutions (12 sur 15) reproduisent la pose observée expérimentalement.

Dans des poches très hydrophobes ou très polaires, les performances du programme original sont bonnes; dans ces cas, le filtre MLP n’a pas montré un impact significatif. Il a également été montré que la borne définie sur les valeurs de MLP a une incidence importante sur les résultats d’ancrage. Sur un set de haute qualité tel que l’ADS, le filtre MLP n’a jamais péjoré les résultats d’ancrage avec un filtrage modéré des HFP, mais les a au contraire améliorés dans plus de 7% des cas, portant le taux de succès d’ancrage à plus de 80%. Ceci est d’autant plus important que le set ADS est principalement formé de cibles d’intérêt thérapeutique.

Arg126 Arg126

Tyr128 Tyr128

(21)

Avec un haut taux de filtrage ou sur un set de complexes de moindre qualité (C/ATS), l'algorithme a parfois des difficultés à placer correctement le ligand. Cependant, le nombre de cas où le filtre MLP améliore les résultats d’ancrage a toujours été supérieur aux cas inverses. D'une manière générale, l’utilisation du filtre MLP est donc conseillée pour l’ancrage de molécules dans une cible structurellement bien caractérisée.

1.3.2 Criblage virtuel de coumarines sur la MAO B

Le potentiel du filtre MLP a été également évalué dans une approche de criblage virtuel focalisé, conjointement à d’autres raffinements de la stratégie d’ancrage, tels que la sélection de molécules d'eau ordonnées et la pré-optimisation du site actif. Ces raffinements ont été inclus individuellement ou en combinaison lors d’une étude de ré-ancrage d’un dérivé de coumarine (composé 2) dans le site actif de la monoamine oxydase B humaine [10], dans le but de déterminer la stratégie d’ancrage la plus adaptée à ce site actif. En parallèle, une série de coumarines connues pour leurs propriétés inhibitrices de la MAO B ont été ancrées dans ledit site actif, selon un processus de criblage virtuel focalisé basé sur la même approche combinatoire que pour le ré-ancrage. Au préalable, les valeurs biologiques d'inhibition de ces dérivés coumariniques avaient été déterminées via une méthode fluorimétrique. Les résultats ont été évalués comme suit : pour le ré-ancrage de la coumarine 2, par la comparaison de la position des solutions proposées par GOLD avec celle observée dans la structure cristallographique (calcul du RMSD) ; pour le criblage virtuel, en comparant les positions du noyau coumarinique des solutions proposées pour chaque composé avec celui de la coumarine de référence, ainsi que par la quantification de la similarité entre les classements d’affinités prédites et mesurées (test de corrélation de Spearman) d’une part, et les classes d’affinité biologique prédites et mesurées d’autre part.

(22)

Figure 1.4 : Modes de liaison pour la coumarine 2 dans le site actif de la MAO B. En jaune : pose observée expérimentalement (cristallographie aux rayons X). En bleu : pose prédite par ré-ancrage avec divers raffinements (6 molécules d’eau, pré-optimisation du site actif, filtre MLP).

Il a été montré que la qualité des prédictions retournées dépend fortement du paramétrage fin des conditions de calcul, tant pour le ré- ancrage que pour le criblage virtuel focalisé. Un choix approprié de molécules d'eau est le facteur le plus important, puisqu'il modifie fortement le récepteur en termes de forme, polarité, et potentiel de liaisons hydrogène. Le filtre MLP et la pré-optimisation du site accepteur ont moins d’influence sur la pertinence des solutions d’ancrage mais ont montré une amélioration synergique une fois combinés avec l'inclusion de molécules d'eau. Cependant, le filtre MLP peut se révéler très intéressant dans les cas où le choix des molécules d'eau n'est pas possible ou fiable. Le paramétrage optimal, tant pour le criblage virtuel que pour le ré-ancrage, a consisté en l’inclusion de 5 ou 6 molécules d'eau, le filtrage des HFP avec le MLP (valeur limite de 0), et une pré-optimisation de la poche active via une minimisation énergétique des chaînes latérales et du cofacteur FAD.

(23)

Classement Classement virtuel biologique

Figure 1.5 : Visualisation des classements virtuel et biologique pour les composés coumariniques présentant un mode de liaison prédit similaire à celui observé expérimentalement (n=68), avec des conditions d’ancrage optimales (5 molécules d’eau liées, filtre MLP et pré-optimisation du site actif). Les composés sont colorés selon leur puissance inhibitrice mesurée in vitro: bleu, composés très actifs (pIC₅₀  8.5) ; jaune, composés intermédiaires (8.5  pIC₅₀  7.0) ; rouge, composés faiblement actifs (pIC₅₀  7.0). Sur la colonne de gauche, les composés sont classés selon leur affinité prédite. On remarque la bonne convergence des données virtuelles et expérimentales.

Le choix de la fonction de score s’est révélé être central; les meilleurs résultats ont été obtenus en choisissant GoldScore - la fonction de score par défaut de GOLD - pour la génération de poses, puis en appliquant un procédé de re-scoring avec ChemScore. Avec cette combinaison, un taux élevé de modes de liaison corrects (par comparaison avec le mode de liaison observé dans le co-cristal) a été généré. De même, une corrélation significative entre les classements d’affinités prédites et expérimentales a été démontrée pour les coumarines. De ce fait, un procédé de criblage virtuel fiable a été développé pour l’ancrage de dérivés coumariniques

Affinité prédite

(24)

dans la MAO B humaine, permettant de sélectionner virtuellement des composés ayant une activité biologique élevée.

Lors cette étude, plusieurs composés intéressants ont été identifiés, y compris deux coumarines présentant une puissante inhibition sélective de la MAO B humaine (pIC₅₀> 9). D’autres composés présentant de multiples propriétés inhibitrices de la MAO B, de l’acétylcholinestérase et de l’agrégation de protéine amyloïde se révèlent d’un intérêt certain dans le domaine du développement de candidats médicaments contre la maladie d'Alzheimer.

1.3.3 Charpentes pour l’inhibition réversible de la MAO

Plusieurs autres familles de composées ont été étudiées in vitro pour découvrir de nouvelles charpentes (scaffolds) polycycliques et hétérocycliques montrant des activités inhibitrices réversibles des MAO A et/ou B. Divers composés présentant des activités potentielles contre plusieurs cibles impliquées dans des maladies neurodégénératives ont été identifiés, et des puissances de l’ordre du sous-micromolaire ont été atteintes. Des études de modélisation moléculaire préliminaires ont été entreprises, et ont permis de mettre en évidence quelques mécanismes structuraux influençant la sélectivité et/ou les différences inter-espèces entre la monoamine oxydase humaine et de rat.

En conclusion, les connaissances structurales obtenues expérimentalement et in silico lors de ce travail peuvent guider la conception future de librairies focalisées, de façon à optimiser des interactions ligand-récepteur et donc à améliorer la puissance et/ou la spécificité.

1.4 Perspectives

L'utilisation du filtre MLP, le choix des molécules d'eau et la pré-optimisation de la cible sont des éléments importants devant être

(25)

considérés lors de la mise en place de stratégies d’ancrage ou de criblage virtuel. Après avoir été validée sur la MAO B humaine, notre approche peut être transposée à d'autres cibles thérapeutiques. Le criblage virtuel de grandes bases de données de composées (telles que ZINC [11] et DUD [12]) seront effectuées avec et sans les paramètres d'optimisation décrits ci-dessus, pour évaluer leur intérêt dans de campagnes de criblage virtuel à plus grande échelle.

De nouvelles approches de chémogénomique sont récemment apparues, et visent à explorer le génome exploitable d’un point de vue thérapeutique (druggable), en étudiant systématiquement l’affinité biologique de ligands à bas poids moléculaire envers de grande sélections de protéines-cibles [13,14]. Dans ce contexte, des méthodes dites de

“criblage virtuel inversé” ont été développées, dans lesquelles l’affinité de ligands est profilée envers de multiples cibles macromoléculaires au lieu d’une seule protéine particulière [15]. Dans de telles applications, l'optimisation du procédé d’ancrage est critique et doit être individualisé pour chaque cible; dans ce contexte, l'utilisation du filtre MLP, du choix de molécules d'eau et la pré-optimisation du récepteur sont d’un intérêt potentiel évident.

Enfin, les données biologiques récoltées vont permettre d’établir des modèles 3D-QSAR pour des coumarines, selon une méthodologie établie en interne, basée sur VolSurf [16]. Cette méthode utilise des descripteurs basés sur des champs moléculaires d’interaction (molecular interaction fields, MIFs) qui permettent d’encoder les forces d’interaction hydrophobes (MLP), la polarisabilité et la capacité de formation de liaisons hydrogène [17]. En parallèle, les relations structure-activité proposées lors de ce travail ouvrent quelques nouvelles perspectives de synthèse, afin d’optimiser les propriétés de certaines familles de composés.

(26)

Références

[1] M. Congreve, C. W. Murray, T. L. Blundell, Keynote review: Structural biology and drug discovery. Drug Discovery Today 2005, 10, 895-907.

[2] I. M. Kapetanovic, Computer-aided drug discovery and development (CADDD): In silico-chemico-biological approach. Chem. Biol. Interactions 2008, 171, 165-176.

[3] M. Stahl, W. Guba, M. Kansy, Integrating molecular design resources within modern drug discovery research: the Roche experience. Drug Discovery Today 2006, 11, 326-333.

[4] S. Ekins, J. Mestres, B. Testa, In silico pharmacology for drug discovery:

methods for virtual ligand screening and profiling. Br. J. Pharmacol.

2007, 152, 9-20.

[5] C. A. Taft, V. B. Da Silva, C. H. T. D. Da Silva, Current topics in computer-aided drug design. J. Pharm. Sci. 2008, 97, 1089-1098.

[6] M. L. Verdonk, J. C. Cole, M. J. Hartshorn, C. W. Murray, R. D. Taylor, Improved protein-ligand docking using GOLD. Proteins: Struct. Funct.

Genet. 2003, 52, 609-623.

[7] P. Gaillard, P. A. Carrupt, B. Testa, A. Boudon, Molecular lipophilicity potential, a tool in 3D-QSAR. Method and applications. J. Comput. Aided Mol. Des. 1994, 8, 83-96.

[8] J. W. Nissink, C. Murray, M. Hartshorn, M. L. Verdonk, J. C. Cole, R.

Taylor, A new test set for validating predictions of protein-ligand interactions. Proteins: Struct. Funct. Genet. 2002, 49, 457-471.

[9] M. J. Hartshorn, M. L. Verdonk, G. Chessari, S. C. Brewerton, W. T. M.

Mooij, P. N. Mortenson, C. W. Murray, Diverse, high-quality test set for the validation of protein-ligand docking performance. J. Med. Chem.

2007, 50, 726-741.

[10] C. Binda, J. Wang, L. Pisani, C. Caccia, A. Carotti, P. Salvati, D. E.

Edmondson, A. Mattevi, Structures of human monoamine oxidase B complexes with selective noncovalent inhibitors: safinamide and coumarin analogs. J. Med. Chem. 2007, 50, 5848-5852.

[11] J. J. Irwin, B. K. Shoichet, ZINC- a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005, 45, 177-182.

[12] N. Huang, B. K. Shoichet, J. J. Irwin, Benchmarking sets for molecular docking. J. Med. Chem. 2006, 49, 6789-6801.

[13] D. Rognan, Chemogenomic approaches to rational drug design. Brit. J.

Pharmacol. 2007, 152, 38-52.

[14] D. Rognan, Development and virtual screening of target libraries. J.

Physiol. 2006, 99, 232-244.

(27)

[15] E. Kellenberger, N. Foata, D. Rognan, Ranking Targets in Structure- Based Virtual Screening of Three-Dimensional Protein Libraries:

Methods and Problems. J. Chem. Inf. Model. 2008, 48, 1014-1025.

[16] G. Cruciani, M. Pastor, W. Guba, VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. Eur. J. Pharm. Sci.

2000, 11, S29-S39.

[17] J. Boccard, F. Bajot, A. Di Pietro, S. Rudaz, A. Boumendjel, E. Nicolle, P.

A. Carrupt, A 3D linear solvation energy model to quantify the affinity of flavonoid derivatives toward P-glycoprotein. Eur. J. Pharm Sci. 2009, 36, 254-264.

(28)

Chapter 2: Introduction

2.1 Computational Approaches to Drug Discovery

Since their introduction in the 1980’s, computational methods have become a key element in both industrial and academic drug discovery research [1,2]. Whereas the initial focus was on the three-dimensional visualization of macromolecules - combined with the development of X-ray crystallography -, the calculation of ligand conformations, pharmacophoric perceptions or Quantitative Structure-Activity Relationships (QSAR) in the lead optimization phase, the application field has become wider [3].

The acronym of CADD (Computer-Aided Drug Discovery) now regroups a large array of techniques using software to retrieve, analyse and integrate biological and medical data; and more specifically, to create computational models that will allow making predictions, suggest hypothesis, and at last, bring innovation and advances in medicine and therapeutics [4].

2.1.1 CADD integration in drug discovery

As the computer potency and the software evolve, computational methods are now present throughout the drug discovery process; an increasing number of new drugs derived from computer-aided design approaches appear on the market [1]. We will focus here on the early stages of drug discovery, were computational applications are extensively used for target assessment, hit identification and lead optimization (see Figure 2.1).

When a new therapeutic target is proposed, its “druggability” must be assessed, i.e., an evaluation of the potential to generate small molecule ligands with the desired activity and selectivity profile must be conducted [1,5]. Such an assessment requires to collect and/or produce relevant data to describe eventual known ligands and their target binding

(29)

site (from known 3D structures, homology models, pharmacophore models or by binding pocket identification techniques). Clear insight on the key amino acids modulating selective binding or functional activity is also necessary [3,6]. This can ultimately guide the selection of sets of proper compounds for the next key step in the drug discovery project: the in vitro screening experiments (high-throughput or focused).

Figure 2.1 : Involvement of computer-based methods (orange and red arrows) in the drug discovery process.

The identification of active hit compounds is essential, so as to establish appropriate starting points for medicinal chemistry. While this task mostly remains delegated to in vitro high-throughput screening (HTS), complementary computerized techniques have emerged and are becoming increasingly important [2,3]. Given the relatively disappointing results obtained by random HTS [7,8], and the huge infrastructure and logistics required (not only for the screening process but also for the preliminary synthesis, storage and maintenance of compounds [1]),

CADD activities Druggability assessment

Data collection & model building X-ray structures Homology modeling

Pharmacophores Target fishing

Virtual screening De novo design Pharmacophore-based design

Scaffold hopping Chemogenomics Target fishing

ADME – Tox profiling and filtering In-depth docking Focused virtual screening

Multivariate SAR models QM-based calculations

Molecular dynamics In-depth ADMET calculations

Data analysis, empirical model building

Clinical trials Hit/Lead identification Lead optimization

Target identification

& assessment

In vitro(HTS) screening In vitro(HTS) screening

(30)

screening has evolved towards the production of less diverse and more target-focused libraries, which are more likely to save time and money in the drug development by removing compounds that are unlikely to bind to the target at early stages of the process [9,10]. In this area, CADD contributions are multiple, including virtual screening [4,8,10-12], de novo design [13-15], fragment-based [16] and pharmacophore-based methods[17], scaffold hopping [18,19], chemogenomics [20,21], target fishing [22] as well as the design of in silico filters to eliminate compounds with undesirable properties (poor activity and/or poor absorption, distribution, metabolism, excretion and toxicity, ADMET) [23-25].

Ultimately, such computational approaches aim at helping to select one or more most promising molecules for further evaluation. The so-called “lead compounds” are then promoted to the optimization stage.

Many in silico techniques are also applied during the lead optimization process, where docking [26-30], ADMET calculations and predictions, quantum mechanics (QM)-based calculations [31-33], molecular dynamics (MD) and all the QSxR family of approaches (i.e.

QSAR, QSPR, QSSR, QSPkR, etc) extensively rely on computer resources to finally transform biologically active leads into suitable drug candidates by improving their physicochemical, pharmaceutical and ADMET/Pk (pharmacokinetic) properties [2].

In the context of data analysis, the processing of the increasingly huge data sets produced by the in vitro and/or in silico screening campaigns demands the involvement of advanced data analysis software, where the use of techniques such as machine learning (support vector machine, recursive partitioning), self-organizing maps, hierarchical clustering, K-nearest neighbour analysis and others [34,35], is becoming preponderant. In the same field, the emerging techniques of systems biology also mine vast amounts of biological data in order to model cell signalling pathways [36].

Eventually, this synergistic in silico effort is aimed at minimizing time and resource requirements of chemical synthesis and biological

(31)

testing, so as to optimize the drug discovery process [2]. Several success stories are reported, such as the development of a novel, potent and selective antidepressant 5-HT1A agonist in less than 2 years from the start to the clinical trials, where a focused in silico modeling drug development program allowed spending less than 6 months in lead optimization and needed the synthesis of only 31 compounds [37].

2.1.2 Docking & target-based virtual screening in drug discovery

Target-based techniques are now established as important tools aiming to reduce the size of chemical space to be searched and to select more promising candidates for lead discovery and optimization [2]. They have benefited from the exponentially rising amount of available structural data, and from the increasing accuracy of the 3D protein models [1,34]. Among these techniques, target-based virtual screening (TBVS) emerges as an effective technique intended to prioritize compounds prior to physical screening. It commonly consists in the quick and rough prediction of binding affinities of compounds in a given dataset by means of high-throughput docking to a 3D structure of the target receptor [38].

TBVS is now widely used to discover new drug candidates from different chemical scaffolds by searching commercial, public, or proprietary 3D chemical structure databases [2]. The compounds forming these databases are scored and ranked according to their predicted likelihood of having an affinity for the target considered [10]. In combination with appropriate ADMET/Pk filtering of undesirable compounds, TBVS aims to accurately select sets of molecules with required properties (activity, drug-likeness, lead-likeness) for subsequent biological testing, thus avoiding time and money expenditures [2,34,39].

(32)

Figure 2.2 displays an overview of a typical TBVS routine. The target selection, database preparation and pre-filtering are beyond the scope of this introduction; we will focus on the docking phase itself and review some of the key issues encountered in this field nowadays.

Figure 2.2 : Typical pipeline of tasks in target-based virtual screening, adapted from Waszkowycz et al. [38].

Docking comprises two distinct tasks: first, the prediction of favourable binding geometries for a small molecule in the binding site of the target protein (the “posing” step);and second, the estimation of the binding free energy of the complex so formed (the “scoring”

step) [27,29,40]. When different compounds are docked in the same target,

Database &

model selection

Database preparation

Final selection Database pre-filtering

Receptor docking

Post-processing &

analysis

• Selection/preparation of receptor models

• Commercial/corporate compound collections

• Virtual libraries

• Removal of non-druglike entities

• ADMET filtering

• Assignment of protonation/tautomerization

• Conformer generation

• Visualization of property distributions

• Visualization of binding modes

• Final selection guided by project goals

• Targeted property profiles

• Diversity selection

• Ligand/receptor-based pharmacophores

• High-throughput docking of compounds

• Multiple docking programs

• Retain multiple poses per ligand

• Calculation of additional properties/scores

• Classification of binding interactions

• Elimination of unpromising ligands

• Modeling refinements (QM, MD..), re-scoring

(33)

they are subsequently “ranked” according to their relative scores (Figure 2.3).

Figure 2.3 : Illustration of the two principal phases of a docking protocol, the “posing”

and “scoring” steps, applied to the fictitious virtual screening (with an additional “ranking” step) of various compounds into the 1kzk binding site.

Docking algorithms

A lot of softwares are available to answer the growing need in fast, reliable pose prediction in target-based drug design. While early algorithms considered ligands as rigid (lock-and-key), most current approaches are able to fully explore ligand flexibility within the target.

These methods can be divided into three basic categories (see Table 2.1):

systematic methods (conformational search, fragmental construction, databases), simulation methods (molecular dynamics), and stochastic methods (genetic algorithms, Monte Carlo, tabu search) [27,41,42].

Systematic methods try to explore all degrees of freedom in a molecule, but the number of evaluations needed raises exponentially as

Posing Ligands

Target ^52.1

49.3 37.6

34.2 33.0

25.3

52.1

49.3

37.6

34.2

33.0

25.3

Scoring Ranking

Solutions

28.6

(34)

the number of degrees of freedom increases. This is particularly evident for conformational search methods such as EUDOC [43] and FRED [44].

To deal with this exponential rise, termination algorithms are applied to remove unfavourable conformations early on, thereby reducing the complexity of the problem [41,42]. Fragmental construction generally relies on the preliminary placement of a “core” ligand fragment into the binding site, followed by a successive addition of the other fragments to recreate the original ligand. A variant placing all fragments before trying to link them covalently is also used [27]. LUDI [45], FlexX [46], Surflex [47], ADAM [48], Hammerhead [48], SLIDE [49] and DOCK from version 4.0 up [50] are some examples of docking programs using a fragmentation search method. While this approach can be well-suited for de novo design, it sometimes requires a subjective choice of the core fragment, and a subjective evaluation of its pharmacophoric importance.

This can have significant repercussions on the docking accuracy [41,51].

The third class of systematic methods is known as the database method.

To avoid the combinatorial explosion problem, libraries of pre-generated ligand conformations are used to account for ligand flexibility [27]. For example, FLOG [52] selects up to 25 ligand conformations from a database and matches them to the receptor by means of a shape-complementarity algorithm, prior to a simplex minimization.

Simulation methods are based on the calculation of solutions to Newton’s equations of motion in various force fields. Molecular dynamics is a popular and accurate simulation approach for all types of studies of dynamic behaviour of proteins and smaller molecules; however, several drawbacks limit its current routine application in docking or virtual screening. Indeed, MD simulations are usually very time-consuming.

High-energy barriers must be crossed to avoid accommodating ligands in local minima of the energy surface within reasonable simulation time periods. For that reason, different approaches have been developed to allow a better exploration of the binding free energy landscape: increase of the temperature of the simulation, “smoothing” of the potential energy

(35)

surface, starting molecular dynamics calculations from different ligand positions, or metadynamics [41,42,53]. In either case, the time consumption currently limits the use of MD to the study of single protein- ligand complexes [54-56]. However, the possibilities are expanding, since MD simulations can be used at different steps in the docking process:

during the preparation of the protein receptor before docking, to optimize its structure and account for protein flexibility; for the refinement of docking results, to account for solvent effects and simulate the induced fit;

and to calculate binding free energies, thus allowing an accurate ranking of the ligands [53].

Table 2.1 : Some of the most common software for flexible ligand-protein docking, classified following the nature of their search algorithm.

SYSTEMATIC METHODS

Conformational search EUDOC [43], FRED [44]

Fragmental construction LUDI [45], FlexX [46], Surflex [47], ADAM [48], Hammerhead [48],

SLIDE [49], DOCK [50]

Databases FLOG [52]

SIMULATION METHODS Molecular dynamics

STOCHASTIC METHODS

Genetic algorithms AutoDock [57], GOLD [58]

Monte Carlo ICM [59], Prodock [60], QXP [61]

Tabu search PRO_LEADS [62]

Stochastic or random methods represent the most widely used docking algorithms, accounting for approximately 50% of the citations retrieved from ISI Web of Science for any of the original references in 2005. To sample the posing space, they perform iterative random changes to a ligand or a population of ligands. At each step, the alteration is

(36)

evaluated by a predefined probability function and subsequently accepted or rejected, so as to rapidly converge towards a global optimum [27]. The two leaders, AutoDock [57] and GOLD [58,63,64] use a genetic algorithm (GA) to find the global energy minimum. Genetic algorithms apply ideas derived from genetics and the Darwinian theory of evolution to answer questions rising from many different fields, docking being only one of them. In this context, GAs start from an initial population of different conformations, each defined by a set of state variables encoded in a gene.

Such variables can account for the translation, orientation, conformation, or H-bonding potential of the ligand in relation to the protein receptor.

The “genotype” of the ligand is thus defined by the full set of state variables, whereas its “phenotype” corresponds to the atomic coordinates.

A fitness function is applied to select the fittest “individuals” of the population, which are carried on to the next generation. Genetic operators (mutations, crossovers, and migrations) are applied to the chromosomes representing the best intermediate solutions so as to sample the whole posing space, until a final population that optimizes the predefined fitness function is reached [27,41,42,65]. In Monte Carlo simulations, new conformations are generated and scored at each step; a Metropolis criterion is then applied to accept or reject the new configuration. If a new solution scores better than the previous one, it is immediately accepted. If the configuration is not a new minimum, the acceptance criterion for a newly obtained pose is based on a Boltzmann probability function.

Compared to MD methods, MC techniques present the significant advantage of using simpler energy functions, and that energy barriers can more easily be stepped over [27,42,51]. Tools such as ICM [59], Prodock [60] or QXP [61] are based on MC algorithms. Finally, tabu searches as found in PRO_LEADS [62] are very similar to MC methods.

Again, random moves are performed and energy is evaluated; however, all solutions are recorded in a “tabu” list. New optimum solutions are only kept if they are different from the previous solutions. Therefore, the sampling of unexplored conformational space is encouraged [27,41,42].

(37)

Most of these techniques do not address the issue of target flexibility, or only partially (with the notable exception of MD simulations, that are able to account for protein flexibility). However, recent developments of some programs allow the exploration of protein flexibility to a certain extent. This aspect will be further discussed in § 2.1.3.

Scoring functions

The formation and stability of a ligand-protein complex is determined by a subtle equilibrium of electrostatic interactions (hydrogen bonds, ionic contacts) and hydrophobic contacts - accounting for enthalpy variations, ΔH -, as well as solvation/desolvation phenomena and conformational freedom changes (accounting for entropy, ΔS ). These contributions lead to a variation of the free energy ΔG which characterizes the binding affinity between the ligand and the target [66-68] (see Equation 2.1, where T is the temperature).

=

G H T S

Δ Δ − Δ Equation 2.1

In the context of docking, the approximation of the binding free energy is generally delegated to the scoring functions, which aim to evaluate the complementarity of the ligand and the target structure. A reliable estimation of the binding affinity is essential, since scoring intervenes at various stages of the docking process: it has to guide the docking search algorithm toward reasonable poses of a ligand; it must distinguish the best or “correct” pose from alternative docking solutions of the same ligand suggested by the algorithm; and finally, it should be able to correctly rank different ligands following their estimated binding affinity [40]. However, an accurate prediction of binding free energies can hardly be reached by calculations in an affordable timescale for most drug- design projects. Therefore, precise calculation methods such as those obtained by free energy perturbation (FEP), umbrella sampling,

(38)

thermodynamic integration (TI) or even simpler models like molecular mechanics/Poisson-Bolzmann surface area (MM-PBSA), MM generalized Born SA (MM-GBSA) and Linear Interaction Energy (LIE) methods [53,66,69,70] are computationally expensive, thus not applicable to a rapid analysis of several binding modes or ligands. Hence, a number of assumptions and simplifications have to be made to reduce the complexity of the scoring problem, with an obvious cost in terms of accuracy [8,27,42].

Various strategies have thus been developed to partially answer the scoring problem, leading to three major classes of scoring functions: force field-based, empirical, and knowledge-based.

Force-field based scoring functions rely on classical molecular mechanics calculations and usually quantify the sum of two energies: the receptor–ligand interaction energy and the internal ligand energy (such as steric strain induced by binding). They approximate the binding free energy by a sum of van der Waals and electrostatic interactions, as well as nonpolar contributions [42,71,72]. The various existing scoring functions are based on different force-field parameter sets; for example, GScore is based on the Tripos force field [46] and AutoDock 3.0 [57] and 4.0 [73] on the AMBER force field [74]. Other binding affinity estimation methods based on molecular mechanics are GoldScore [63], QXP [61], GRID [75] or CHARMm [76]. Standard force-field scoring functions have several limitations, in that they do not always take solvation and entropic terms into account. Besides, force-field-based scoring generally requires the introduction of cut-off distances for the handling of non-bonded interactions, which are more or less arbitrarily chosen. This complicates the accurate estimation of long-range effects involved in binding [42].

Finally, they are quite time-consuming and are very sensitive to erroneous estimations of partial charges and protonation states, as well as inaccuracies in the protein structure models [77].

Empirical scoring functions assume that binding energies can be approximated by a sum of individual uncorrelated terms. The coefficients

(39)

of these various terms are obtained from regression analysis from experimentally determined binding energies and X-ray crystallographic data. Hence, they are intended to be able to reproduce experimental data, such as binding energies and/or conformations [42]. However, the quality of the training sets and the homogeneity of the experimental data used can diminish the correctness of the binding free energy estimation. To achieve correct estimations, computational chemists must train the scoring functions on complexes similar to those of their drug design projects, and update them continually. Similarly, transferability issues arising from the differences between the docked ligand and the molecules in the training sets of the scoring function have repercussions on the predicted binding affinity [41]. Nevertheless, they are relatively rapid, and are widely used in programs such as FlexX [78,79], SCORE [80], PLP[81], ChemScore [82,83], LUDI [45], X-Score [84] and the more recent Protein Alpha Shape Similarity Analysis [77].

Knowledge-based scoring functions are based on analysis of structural data alone, without reference to experimentally determined binding affinities. Paired atom-atom interactions in known structures of protein-ligand complexes are statistically analyzed. The frequency of occurrence of these atomic pair interactions is used as a measure of their energetic contribution to binding. The frequencies are compared to frequencies from a random or average distribution, following an inverse formulation of the Boltzmann law. A high frequency indicates an attractive interaction, while a low frequency indicates a repulsive interaction [77]. The computational simplicity of knowledge-based scoring functions allows the rapid screening of large compound databases.

DrugScore [85], SMoG [86] BLEEP [87], PMF [88] and the Astex Statistical Potential (ASP, [89]) are based on this approach. However, the relatively limited sets of protein–ligand complex structures from which the information is derived can cause erroneous estimations for atypical ligands and/or proteins [42].

(40)

Table 2.2 : Common scoring functions for the approximation of the binding free energy in the field of docking.

FORCE-FIELD BASED

GScore [46], AutoDock [57,73], GoldScore [63], QXP [61], GRID [75], CHARMm [76]

EMPIRICAL

FlexX [78,79], SCORE [80], PLP[81], ChemScore [82,83], LUDI [45], X-Score [84], PASSA [77]

KNOWLEDGE-BASED

DrugScore [85], SMoG [86] BLEEP [87], PMF [88], ASP [89]

As it appears that all scoring methods have their own pros and cons, several authors developed the concept of consensus scoring [90-93]. While this approach yielded better results than using a single scoring function in some studies, one should use such tools warily. Indeed, if terms in different scoring functions are intercorrelated, this could amplify calculation errors, rather than equilibrate them. Similarly, if one function is much worse than all the others, it will strongly degrade the consensus results. Therefore, the choice of the scoring functions is critical.

Eventually, the use of multiple scoring functions can also bury the structural and chemical meaning of the results obtained under statistical noise [41,42].

2.1.3 Current limitations of docking techniques and TBVS

Several factors restrict the reliability of current docking and TBVS methods. First, structural data imperfections must be considered. The crystallization process and the modeling of the 3D structure from electron density maps is subject to experimental or methodological flaws [94,95], leading to a structure that is only the result of an averaging in time and space of the position of the individual atoms forming the crystal. Given the actual resolution obtained for X-ray structures, the position of hydrogen atoms cannot be unambiguously determined. This has major implications

(41)

for the assignment of protonation states, and thus on the H-bonding properties of some of the ligands and residues. Likewise, the imidazole ring of histidine can adopt two virtually identical orientations [27].

Finally, the use of structures built by homology modeling techniques should raise questions regarding the validity of the model [9,96]. Careful inspections should thus be performed prior to the choice of a target structure.

While static 3D crystallographic structures are often used “as is” in docking studies, it must be kept in mind that they cannot be considered as a “true” representation of the target. Indeed, proteins are mobile molecules, existing in multiple conformational states separated by variably high energy barriers. Besides, these conformations are not equivalent regarding their ability to bind a ligand [97,98]. The presence of a ligand itself will provoke a certain extent of side chain or even backbone rearrangement (“induced fit”) [9,99,100], which has been shown to have strong repercussions on docking accuracy depending on the target [101,102]. Therefore, several attempts have been made to include protein flexibility in docking. Indeed, the latest versions of GOLD [103], FlexE [104], ICM [105] AutoDock [106] or SLIDE [100] include various methods able to somewhat simulate moderate side-chain movements as observed in induced-fit situations. These methods include rotamer libraries, average structures, or receptor ensembles. The GLIDE/PRIME iterative procedure [107] is another interesting method; in parallel, a range of models and computational tools have been recently developed to accommodate protein flexibility into docking: SCARE [108], FlipDock [109], FlexScreen [110], and others [111-113]. However, these approaches are still perfectible and remain to be extensively and independently validated.

Another chief issue is the treatment of solvation, both at the docking and scoring stages. While early methods mainly relied on implicit solvent models, such approaches cannot simulate the important role of water molecules in molecular recognition (H-bond bridges, displacement/entropy

(42)

effects) [114-118]. This issue will be more thoroughly discussed in paragraph 4.2.1 (Chapter 4). Recent implementations of explicit water molecule handling in various docking and scoring programs are now available - even with water toggling options to simulate the displacement of particular water molecules - and show overall positive results; although the final answer to the solvation problem is not yet at reach, significant progresses are being made [64,119-121].

Several studies have shown that the docking algorithms currently at hand are relatively reliable in terms of pose prediction. Indeed, correct poses are often generated by the search algorithm; however, the scoring function is not always able to recognize them. Therefore, the scoring process is not mature yet [38,42,122,123], leading to limited accuracy in ranking compounds from diverse series or even more from homologous series. Hence, the virtual screening of large datasets often leads to a large number of false positives (high-scoring decoy compounds), whereas known ligands remain buried down the rankings (false negatives) [2].

The increasing size of the virtual libraries requires extensive computational power, particularly for methods taking target flexibility into account [2]. In the same context, actual docking programs and scoring functions need extensive parameter adjustment in order to retrieve the best possible results for a given target [30]. Expert handling is thus required for each program, not only for the setting of the docking parameters, but also for the interpretation of the data retrieved. This complicates the task of comparing different protocols of docking and TBVS [3,72,123].

Finally, another limitation to be mentioned is the synthetic feasibility of virtual ligands created by de novo design. Indeed, novel compounds displaying a high virtual affinity for their target might reveal very difficult and costly to synthesize [9].

(43)

2.1.4 Conclusions and future trends in target-based virtual screening

The use of computational techniques in the drug discovery process is rapidly gaining in popularity and performance. High attrition rates and elevated costs in the pharmaceutical research are strong incentives to develop alternative in silico methods to predict drug activity, specificity, kinetic fate and toxicity. Given the increasingly large amount of biological, physicochemical, toxicological and structural data available, computational models are key tools to help and steer drug discovery processes, by bringing the existing knowledge together in a quantitative framework. Naturally, models are not always reliable, and a critical evaluation of the calculation results is essential. The available resources, the data quality, the limits of the computational tools and the pertinence of the built model are essential points to be evaluated, and demand a wise awareness of what is realistically achievable [124,125]. CADD techniques should thus not be used in a black box manner, but with a critical insight of the global in silico-chemico-biological process [2].

In the context of target-based virtual screening, there is still much to be done to improve the posing and scoring processes. While a single, universal scoring function might remain out of reach for the foreseeable time, improved performances can be achieved by a better modeling of solvation and entropy phenomena. An interesting approach is the use of high-precision scoring methods to re-score initial hits from virtual screening, obtained with a faster, rougher function [8]. Similarly, developing specific scoring functions targeted towards a particular protein class can improve the reliability of the binding energy estimation.

Finally, as the CADD implication in the drug discovery and development processes increases, it is essential to optimize the interface between the experimental and virtual processes, so as to maintain an overall insight of the whole workflow [3,34].

(44)

2.2 Scope and aims: validation of computational procedures supporting target-based molecular modeling

Our laboratory’s expertise in the description and prediction of lipophilic interactions has led to the development of several molecular modeling procedures. As outlined in the previous paragraphs, current docking and scoring techniques still need improvement. In this context, an extensive validation of an in-house hydrophobicity description tool and of other modeling refinements will be performed, with the aim of improving the predictive capacity of docking and target-based virtual screening approaches. An application of these procedures will be conducted in the context of human monoamine oxidase B (HuMAO B) inhibition, an important drug target in the field of neurodegenerative disorders. In parallel, several small focused compound libraries will be biologically screened against the latter target to provide experimental data supporting the validation of the modeling tools, and to select novel hit compounds able to inhibit MAO B.

Outline of the thesis

While this chapter presented a review of state-of-the-art computational tools used in the drug discovery process, the next will address a key element in target-based drug design: the hydrophobicity description of the target. As previously pointed out, nonpolar contacts between a ligand and its target are important interactions in the binding process, and thus essential contributions to be handled by docking programs. Therefore, a novel tool aiming at accurately describing hydrophobic regions at the binding site [126] was implemented into a GOLD [58] docking strategy. An extensive validation of this tool - namely, the MLP-filter - will be presented. The MLP-filter is based on the Molecular Lipophilicity Potential (MLP), an in-house tool that was

Development and validation of target-based drug design tools: virtual screening of monoamine oxidase inhibitors

Thesis

Reference

Development and validation of target-based drug design tools: virtual screening of monoamine oxidase inhibitors

Development and Validation of Target-Based Drug Design Tools: Virtual Screening of

Monoamine Oxidase Inhibitors

Table of contents

Abbreviations and symbols

Chapter 1: Résumé de la thèse

1.3.1 Validation du filtre MLP

1.3.2 Criblage virtuel de coumarines sur la MAO B

1.3.3 Charpentes pour l’inhibition réversible de la MAO

Chapter 2: Introduction

2.1.1 CADD integration in drug discovery

2.1.2 Docking & target-based virtual screening in drug discovery

2.1.3 Current limitations of docking techniques and TBVS

2.1.4 Conclusions and future trends in target-based virtual screening