Multiple Applications of Structure-based Methods in Drug Discovery

(1)

Thesis

Reference

Multiple Applications of Structure-based Methods in Drug Discovery

TESSARO, Francesca

Abstract

The present work aims to cover different aspects of computational tools, particularly focusing on structure-based methods. A general description of these methods precedes the first chapter, which focus on the evolution and current state-of-the-art of molecular docking method, in particular, when challenged in docking small molecules to nucleic acids.

Successively, two case studies are presented. The first ‘protein target' study presents the rational investigation of a nanomolar compound (49c), which has been discovered to inhibit a novel class of aspartic proteases, namely Toxoplama gondii aspartyl protease 3 and Plasmodium falciparum plasmepsin IX-X, that are crucial for the respective obligate parasite.

The second study presents a combination of in silico and phenotypic approach discovering PK4C9 small molecule modulator targeting TSL2, an RNA motif implicated in the pathogenesis of spinal muscular atrophy. PK4C9 employs its therapeutic effect correcting the splicing and restoring the lacking levels of SNM protein.

TESSARO, Francesca. Multiple Applications of Structure-based Methods in Drug Discovery. Thèse de doctorat : Univ. Genève, 2020, no. Sc. Vie 61

URN : urn:nbn:ch:unige-1422527

DOI : 10.13097/archive-ouverte/unige:142252

Available at:

http://archive-ouverte.unige.ch/unige:142252

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Laboratoire de Biochimie Pharmaceutique Professeur Leonardo Scapozza

Multiple applications of structure-based methods in drug discovery

THÈSE

présentée aux Faculté de médicine et des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences en sciences de la vie, mention sciences pharmaceutiques

par

Francesca Tessaro

de Padova (Italy)

These N°: 61

Genève

Atelier de reproduction Repromail 2020

(3)

(4)

To Chiara

(5)

(6)

Abbreviations

ASP Aspartyl protease ATc Anhydrotetracycline

CADD Computer-aided drug discovery DMSO Dimethyl sulfoxide

DNA Deoxyribonucleic acid

E7 Exon 7

EM Electron microscopy

GB Generalized Born

GPU Graphics processing units

HTVS High throughput virtual screening IC50 Half maximal inhibitory concentration IC Incremental construction

LB Ligand based

LBVS Ligand based virtual screening

MD Molecular dynamics

MIC Minimal inhibitory concentration

NA Nucleic acid

NDB Nucleic acid data bank NMR Nuclear magnetic resonance

PB Poisson-Boltzmann

PDB Protein data bank

Pf PMIX Plasmodium falciparum Plasmepsin IX Pf PMX Plasmodium falciparum Plasmepsin X QSAR Quantitative structure activity relationship RMSD Root mean square deviation

RNA Ribonucleic acid

(9)

Sa FabI Staphylococcus aureus enoyl-acyl carrier protein reductase SAR Structure activity relationship

SB Structure based

SMA Spinal muscular atrophy SMN Survival motor neuron

TgASP3 Toxoplasma gondii Aspartyl protease 3 TAR Trans-activation response element TSL2 Terminal stem loop 2

VS Virtual screening

(10)

Summary

The present work aims to cover different aspects of computational tools, with a particular focus on structure-based methods. The main outline of the thesis foreseen the following: a general description of these methods with a more in depth analysis of molecular docking history and development, for then moving to their application in two case studies and elucidate their impact in drug discovery research.

As introductory note, I have briefly analysed the intricate journey to bring a single molecule to become an approved drug, which in average requires between 9 to 15 years with costs that reach

$2-3 billions. In reply, computational methods aims to streamline the discovery process along the pre-clinical and clinical phases and contribute in this sense to reduce the cost and time otherwise required. They can be divided into two main categories: ligand-based and structure-based, depending on the source of information used. This work mainly focus on the description and application of the second category, where the target structure is explored using different approaches, in particular molecular docking and dynamic simulations.

In the first chapter, I have retraced the ‘40-year evolution of molecular docking algorithms’, their development together with the advancement of computing architecture and power has exponentially increased the number of its applications. The essential requirement for ligand- docking is the target structure that can be obtained either from experimentally determined structures (f.e. crystallography, NMR, EM) or from the use of computational methods, as homology modelling. Together with the structure, searching methods and force fields compose the docking engine. With the course of the years, docking algorithms have been developed allowing the evolving from the so-called ‘rigid-docking’ searching methods to the more sophisticated ‘semi-

(11)

flexible’ and ‘flexible docking’ algorithms. ‘Rigid-docking’ does not consider ligand-flexibility exploring only the six transitional and rotational degree of freedom and referring to the concept of molecular recognition as described by the ‘lock and key’ concept. The latter searching methods allow flexibility, respectively of the sole ligand and for the ‘flexible-docking’ of both ligand and target. These methods recapitulate the two theories: ‘induce-fit’ and ‘conformational ensemble, which has better described certain molecular events as the allosteric modulation. The usage of molecular docking can be applied not only to investigate a single-ligand binding event but also to screen compound libraries against one or several targets as one possible strategy to predict side effects or for drug repositioning. To date, the majority of the investigated therapeutic targets were proteins; therefore, the docking programs were mostly refined focusing on them. In this sense, nucleic acids are emerging therapeutic targets investigated as cytostatic agents for cancer and for the treatment of bacterial and viral infections. Recently, targeting the spliceosome machinery has raised huge interest, offering the advantage to intervene at the early stage of the mRNA level without altering the genome. One example of mRNA targeting modulating the splicing is reported in chapter III, for the treatment of spinal muscular atrophy (SMA). Since molecular docking programs have been optimized mostly for proteins, I have analysed literature data on nucleic acid benchmark of some of the widely used docking programs. Furthermore, similarity and differences have been highlighted between proteins vs nucleic acid targets, which are intrinsically related to their chemical and structural nature.

In chapter II, I have reported the first case study where structure-based methods are used for the exploration of a new class of proteases namely Toxoplama gondii aspartyl protease 3 (TgASP3) and Plasmodium falciparum plasmepsin IX and X (PfPMIX-X). They are both involved in invasion and egress of the respectively obligate parasites. The presence of a potent and selective

(12)

inhibitor, compound 49c, allowed characterizing the structural requirements for specifically target this class of protease. Moreover, since no tri-dimensional data was available, I have built by homology the structural models of the three proteins. Structural analysis followed by molecular docking of compound 49c and their specific substrates, allowed the identification of a critical phenylalanine residue in the Flap region responsible for the substrate and ligand selectivity in this class of protease. Intriguing, chemically induced resistant parasites presented a phenylalanine to cysteine mutation at the aforementioned site, which resulted in a loss of activity witnessed by threefold increase of the IC50 values of 49c comparing to WT without affecting substrate processing. The experimental confirmation of the computational predictions were crucial for validating the models and the applied protocol.

As mention earlier, in chapter III, I report a second case study, for the discovery of small molecule splicing modulators targeting TSL2, an RNA motif implicated in the pathogenesis of spinal muscular atrophy (SMA). Results from a combined virtual and phenotypic screening revealed the promising hit compound PK4C9, which is able to correct the splicing machinery and restore the necessary levels of lacking SMN (survival motor neuron) protein in cells from SMA patients. The NMR structure of the TSL2 hairpin allowed me to study the mode of interaction of PK4C9 at the molecular level. This was then followed by multiple dynamic simulations using both wildtype and mutant TSL2 structures, in order to elucidate the required RNA conformation, which correct the splicing. As major research outcome, TSL2 has been validated as therapeutic target for the treatment of SMA and PK4C9 is the first identified small molecule targeting this RNA motif.

To conclude after a general introduction of structure-based methods, I have described its application in two case studies respectively in protein and nucleic acid targets. In both the cases, computational studies were fundamental for the ligand-binding description. They have

(13)

successfully guided the rational investigation of the ‘hot-spot’ regions necessary for the ligand recognition in the active site and elucidated the structural changes required to determine the therapeutic effect. Finally, these findings concur to consolidate the affirmation of the crucial role played by computational methods in drug discovery research.

(14)

Résumé de la thèse

Le présent travail vise à couvrir différents aspects des outils de calcul, avec un accent particulier sur les méthodes basées sur la structure. Le plan principal de la thèse prévoyait ce qui suit: une description générale de ces méthodes avec une analyse plus approfondie de l'histoire et du développement de l'amarrage moléculaire, pour ensuite passer à leur application dans deux études de cas et élucider leur impact dans la recherche sur la découverte de médicaments.

Comme note introductive, j'ai brièvement analysé le parcours complexe pour amener une seule molécule à devenir un médicament approuvé, qui nécessite en moyenne entre 9 et 15 ans avec des coûts qui atteignent 2-3 milliards de dollars. En réponse, les méthodes de calcul visent à rationaliser le processus de découverte tout au long des phases précliniques et cliniques et contribuent en ce sens à réduire les coûts et le temps autrement nécessaires. Ils peuvent être divisés en deux catégories principales: ligand-based et structure-based, selon la source d'information utilisée. Ce travail se concentre principalement sur la description et l'application de la deuxième catégorie, où la structure cible est explorée en utilisant différentes approches, en particulier le docking moléculaire et les simulations dynamiques.

Dans le premier chapitre, j’ai retracé « l’évolution des algorithmes de docking moléculaire sur 40 ans », leur développement et les progrès de l’architecture informatique et de la puissance ont augmenté de façon exponentielle le nombre de ses applications. La condition essentielle pour l'amarrage du ligand est la structure cible qui peut être obtenue soit à partir de structures déterminées expérimentalement (par exemple cristallographie, RMN, EM) ou à l'aide de méthodes de calcul, comme modélisation d'homologie. Avec la structure, les méthodes de recherche et les champs de force composent le moteur d'ancrage. Au fil des ans, des algorithmes d’arrimage ont

(15)

été développés permettant de passer des méthodes de recherche dites « rigid docking » aux algorithmes plus sophistiqués « semi-flexibles » et « flexible docking ». « Rigid docking » ne considère pas la flexibilité du ligand en explorant uniquement les six degrés de liberté de transition et de rotation et en se référant au concept de reconnaissance moléculaire tel que décrit par le concept de « verrou et clé ». Les dernières méthodes de recherche permettent la flexibilité, respectivement du ligand unique et pour le « flexible docking » du ligand et de la cible. Ces méthodes récapitulent les deux théories: « induire-ajustement » et « ensemble conformationnel », qui a mieux décrit certains événements moléculaires comme la modulation allostérique.

L'utilisation de l'amarrage moléculaire peut être appliquée non seulement pour étudier un événement de liaison à un seul ligand, mais aussi pour cribler des bibliothèques de composés contre une ou plusieurs cibles comme une stratégie possible pour prédire les effets secondaires ou pour le reproposé de médicaments. À ce jour, la majorité des cibles thérapeutiques étudiées étaient des protéines; par conséquent, les programmes d'accueil ont été pour la plupart affinés en se concentrant sur eux. En ce sens, les acides nucléiques sont des cibles thérapeutiques émergentes étudiées comme agents cytostatiques pour le cancer et pour le traitement des infections bactériennes et virales. Récemment, le ciblage de la machinerie des spliceosomes a suscité un énorme intérêt, offrant l'avantage d'intervenir au stade précoce du niveau d'ARNm sans altérer le génome. Un exemple d'ARNm ciblant la modulation de l'épissage est rapporté au chapitre III, pour le traitement de l'atrophie musculaire spinale (SMA). Étant donné que les programmes d'amarrage moléculaire ont été principalement optimisés pour les protéines, j'ai analysé les données de la littérature sur le point de référence des acides nucléiques de certains des programmes d'amarrage largement utilisés. De plus, des similitudes et des différences ont été mises en évidence entre les

(16)

protéines et les cibles d'acide nucléique, qui sont intrinsèquement liées à leur nature chimique et structurelle.

Dans le chapitre II, j'ai signalé la première étude de cas où des méthodes basées sur la structure d’homologie sont utilisées pour l'exploration d'une nouvelle classe de protéases, à savoir Toxoplama gondii aspartyl protéase 3 (TgASP3) et Plasmodium falciparum plasmepsine IX et X (PfPMIX-X). Ils sont tous deux impliqués dans l'invasion et la sortie des parasites obligés respectivement. La présence d'un inhibiteur puissant et sélectif, le composé 49c, a permis de caractériser les exigences structurelles pour cibler spécifiquement cette classe de protéase. De plus, aucune donnée tridimensionnelle n'étant disponible, j'ai construit par homologie les modèles structuraux des trois protéines. Une analyse structurale suivie d'un ancrage moléculaire du composé 49c et de ses substrats spécifiques, a permis d'identifier un résidu de phénylalanine critique dans la région de Flap responsable de la sélectivité du substrat et du ligand dans cette classe de protéase. Des parasites résistants induits chimiquement et intrigants ont présenté une mutation phénylalanine en cystéine sur le site susmentionné, ce qui a entraîné une perte d'activité observée par une augmentation du triple des valeurs IC50 de 49c par rapport à WT sans affecter le traitement du substrat. La confirmation expérimentale des prédictions de calcul a été cruciale pour valider les modèles et le protocole appliqué.

Comme mentionné précédemment, dans le chapitre III, je rapporte une deuxième étude de cas, pour la découverte de modulateurs d'épissage de petites molécules ciblant TSL2, un motif d'ARN impliqué dans la pathogenèse de l'atrophie musculaire spinale (SMA). Les résultats d'un dépistage virtuel et phénotypique combiné ont révélé le composé à succès prometteur PK4C9, qui est capable de corriger le mécanisme d'épissage et de restaurer les niveaux nécessaires de protéine SMN (neurone motrice de survie) manquante dans les cellules des patients atteints de SMA. La structure

(17)

RMN de l'épingle à cheveux TSL2 m'a permis d'étudier le mode d'interaction de PK4C9 au niveau moléculaire. Ceci a ensuite été suivi de multiples simulations dynamiques utilisant à la fois des structures de type sauvage et mutantes TSL2, afin d'élucider la conformation d'ARN requise, qui corrige l'épissage. Comme résultat de recherche majeur, TSL2 a été validé comme cible thérapeutique pour le traitement de la SMA et PK4C9 est la première petite molécule identifiée ciblant ce motif d'ARN.

Pour conclure après une introduction générale des méthodes basées sur la structure, j'ai décrit son application dans deux études de cas respectivement dans les cibles de protéines et d'acides nucléiques. Dans les deux cas, les études informatiques étaient fondamentales pour la description de la liaison au ligand. Ils ont réussi à guider l’étude rationnelle des régions de « points chauds » nécessaires à la reconnaissance du ligand dans le site actif et à élucider les changements structurels nécessaires pour déterminer l’effet thérapeutique. Enfin, ces résultats concourent à consolider l'affirmation du rôle crucial joué par les méthodes informatiques dans la recherche sur la découverte de médicaments.

(18)

(19)

(20)

PREFACE

(21)

Computational methods in drug discovery research

In the last decade, the long and challenging process of drug discovery research has seen an increasing of the costs up to $2.8 billion dollars, based on a report of the Trufts Centre (DiMasi, Grabowski et al., 2016). This will definitely have a major impact on the strategic choices in drug discovery campaigns for rationalize and prioritize promising candidates. In this respect, computational methods are valuable tools routinely applied in drug discovery from which the entire research process can benefit. Their pivotal role has been by now established (Jorgensen, 2004) with success cases of discovered or optimized drugs using computational approaches that are now in the clinic (Talele, Khedkar et al., 2010). Computational methods can assist drug discovery along the pre-clinical phase: from the gene and target validation to compound optimization and pharmacological profile (Figure 1). The different methods are chosen based on the source of information available and on the defined aim of the research strategy.

Figure 1: Computer-based drug discovery tools currently used in drug discovery.

(22)

We can assemble these methods into two big categories: ligand-based and structure-based depending on the source of information available, respectively the active molecule or the tri- dimensional structure of the target of interest (Figure 2).

The present thesis focuses on structure-based methods applied to drug discovery. The following paragraphs propose to give a general overview of LB and SB methods, where the latter will be analysed more in-depth. Following an evolutionary perspective of molecular docking (Chapter I) will be discussed with its application in drug discovery projects (Chapter II-III).

0.1 Ligand-based (LB) methods

LB methods rely on the information derived from the active small molecule binding to the desired target or with the expected biological effect. Among the LB methods: pharmacophore, ligand- based virtual-screening (LBVS) and Quantitative-Structure-Activity-Relationship (QSAR) are widely used in drug discovery campaigns, especially when the target structure is unavailable or

Figure 2: The two main class of computational methods, ligand-based and structure-based.

(23)

unknown. LBVS takes advantage of the chemical features or fingerprints of the hit molecule for then screening compound libraries by molecular similarity and select the ones to be prioritized for experimental testing. Similarly, pharmacophore models can be used to screen libraries with the advantage of including structural feature of the active site generated by the binding of the active molecule. QSAR methods are normally implemented in the lead profiling and optimization phase (Figure 1). It models the relationship between the structural features of the compound with its biological effect. The advantage is that after optimisation, the QSAR models will be used to predict the biological activity of novel molecules. Today, LB methods have seen a substantial improvement with the implementation of machine-learning techniques helping on identify patterns in the data sets to be used for ameliorate the predictions.

0.2 Structure-based (SB) methods

Differently, SB methods require the availability of the tri-dimensional structure of the therapeutic target, which is usually obtained experimentally by x-ray crystallography, nuclear magnetic resonance (NMR) or electron microscopy (EM). The knowledge of the structure enables the exploration of the binding site and evaluate the recognition of interesting molecules by binding affinity. When the target structure is not available, computational methods as homology modelling can predict the structure based on sequence similarity of evolutionary related proteins, which often share similar functions and binding sites. Other possible computational methods for structure predictions are ab inizio folding and de novo design (see chapter I section 2.2). As we already mentioned, the advantage of having the target structure allow the exploration of the active site, the prediction of the ligand binding affinity and the screening of multiple compounds. Molecular docking and dynamic simulations are two SB methods extensively and commonly used by computational chemists to address this purpose.

(24)

0.2.1 Molecular docking

The primary purpose of molecular docking is to predict the preferential binding mode (also referred as binding pose) of the ligand to its biological target. The evaluation on how well the ligand binds to its target is determined by binding affinity prediction. The docking engine can be described as two-steps process: a first sampling algorithm explores different binding poses allowing a certain degree of freedom (rotation/translation) to the ligand. In the second step, the generated binding poses are evaluated using a scoring function. Both steps are crucial for a correct binding pose prediction by the docking tool: is from one side the sampling method should search for all the possible conformations describing the energetic landscape. On the other side, scoring functions, which can be defined as an approximation of the binding affinity, they need to be able to discriminate among all the possible solutions the native-bound complex from the non-binders.

Scoring functions can be divided into four main classes: (1) Force-field based, (2) knowledge- based, (3) empirical and (4) consensus.

• Force-field (FF) based scoring functions

FF based scoring functions are developed using classical molecular mechanics. In docking, the estimation of the binding energy normally considers the contribution of non-bonded terms describing the interactions, as electrostatic potential, van der Waals interactions and solvation effect. Intramolecular or bonded contribution ideally can be added considering ligand torsional terms. Here in equation (1) is reported an example of the contribution of the non-bonded energy:

(1)

(25)

The first term (red) represents the van der Waals interaction described by the Lennard-Jones potential, whereas the second term (orange) is the electrostatic potential, described by the Coulomb function.

• Knowledge-based scoring functions

These scoring functions derived from experimentally determined protein-ligand information. The frequency of occurrence of specific interactions are derived from a large number of target-ligand complexes and these are used to generate statistical potentials based on the Boltzmann distribution.

One of the key features of these scoring functions is their high dependency on the data set used to generate them.

• Empirical scoring functions

Empirical scoring functions consider the energy of binding as a sum of the contributions of a number of individual terms derived experimentally. These terms can be the number of hydrogen bonds, hydrophobic effect, rotations and different types of contacts. Regression analysis is usually done to obtain weights of the terms using experimental target–ligand complexes with known binding free energy data. Here in equation 2 is reported an example:

(2)

(26)

• Consensus scoring functions

Consensus scoring meant to combine multiple scoring functions with the hope to balance possible errors introduced by each scoring function and to improve the accuracy of the binding energy prediction.

Searching methods constitute the second important component in molecular docking. They try to find the best fit between the ligand and the target generating all the possible binding modes. The number of generated conformations increase exponentially as the number of rotational, translational degree of freedom is take into account, in other terms when flexibility is considered.

In the following chapter, I will report a more in-depth study of the evolution of the docking algorithms and searching methods with their potential usage as multi-purpose tool.

0.2.2 Molecular dynamic simulations

Another structure-based method widely used is molecular dynamic (MD), which is used to simulate a biological event or process as function of the time. Classical molecular dynamics are based on the numerical integration of the Newton’s equation of motion (3):

𝐹(𝑡) = 𝑚 × 𝑎(𝑡) (3) Where F(t) represents the force exerted on an atom at the time t with a defined mass (m) multiplied for the acceleration a.

MD enables to understand the target motion and its biological relevance, for this reason it has a wide range of applications. In example, they can be used to assess the ligand-binding stability of an investigated complex. It can also characterize key movements crucial for explaining the drug action as for example in allosteric modulation. Moreover, it might reveal some ‘cryptic’ cavities,

(27)

which are normally difficult to individuate in experimental structures. The typical time-scale of a MD simulation is in the order of nanoseconds to microseconds allowing capturing biologically important conformational transitions. Simulating on a millisecond timescale makes possible to overcome high-energy barriers and explore energetic minima. Unfortunately, the major drawback of this tool is the high computational cost required for simulate such events. To overcome these limitations, the advent of GPUs, supercomputers and enhancing techniques allow its usage for a routine scope.

0.3 Workflow and aim of the work

The present work proposes a general overview of the use of computational methods in drug discovery highlighting their crucial role in streamlining the effort. As already mentioned in this preface, a major focus is on SB methods. Their implication in research projects carried out along my doctoral studies is then proposed.

Among these methods, molecular docking has seen an incredible evolution since its first appearance. The integration of novel searching algorithms substantially tries to compensate and overcome its main limitation regarding the target flexibility. With the advancements of the computing architectures, molecular docking has been proposed to simulate the molecular recognition events from a single ligand-target complex to compute large library screening. This has as consequence the possibility to use this method for a multitude of scopes ranging from peptide-docking to polipharmacology until nucleic acid docking.

Application of these SB methods has demonstrated its central role in the context of two case studies: (1) the exploration of a novel class of aspartyl proteases, Toxoplasma gondii ASP3 and Plasmodium falciparum PMIX-X, which are found trivial during the mechanism of infection of

(28)

these obligate parasites (Mukherjee, Tessaro et al., 2018). (2) The second study focus on the emerging field of small molecule targeting RNA. The non-coding mRNA motif, TSL2 is implicated in the modulation of the splicing machinery and the production of the survival motor neuron (SMN) protein, which is normally lacking in the context of the spinal muscular atrophy (SMA) (Garcia-Lopez, Tessaro et al., 2018b).

0.4 Another example of drug discovery case study using SB methods

As part of my doctoral training, I have also conducted another project related on the discovery on novel antibiotics (Ghattas, Eissa et al., 2019). In particular, I was responsible of performing both experimental and computational studies for the characterization of a library of compounds bearing the rhodanine scaffold and targeting the Staphylococcus aureus enoyl-acyl carrier protein reductase (SaFabI). This enzyme executes the last step in the bacterial fatty acid biosynthesis and it is considered a validated target for the development of antibacterial drugs. Compound 32 appears as the most potent of the series totally inhibiting the SaFabI activity at 30 µM concentration and presenting IC50 values of 0.92 µM. With the use of docking and subsequent dynamic simulations, I was able to elucidate the binding mode of this class of compounds in the active site of the SaFabI enzyme, where the key catalytic residues are interacting with the inhibitor. Importantly, compound 32 is able to maintain its activity also against Staph. aureus resistant strains with MIC values of 4 µg/ml. This is a crucial feature for the novel generation of antibacterial compounds.

0.5 References

1. DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics 47: 20-33

(29)

2. Garcia-Lopez A, Tessaro F, Jonker HRA, Wacker A, Richter C, Comte A, Berntenis N, Schmucki R, Hatje K, Petermann O, Chiriano G, Perozzo R, Sciarra D, Konieczny P, Faustino I, Fournet G, Orozco M, Artero R, Metzger F, Ebeling M et al. (2018) Targeting RNA structure in SMN2 reverses spinal muscular atrophy molecular phenotypes. Nat Commun 9: 2032

3. Ghattas MA, Eissa NA, Tessaro F, Perozzo R, Scapozza L, Obaid D, Atatreh N (2019) Structure-based drug design and in vitro testing reveal new inhibitors of enoyl-acyl carrier protein reductases. Chem Biol Drug Des 94: 1545-1555

4. Jorgensen WL (2004) The many roles of computation in drug discovery. Science 303: 1813-8

5. Mukherjee B, Tessaro F, Vahokoski J, Kursula I, Marq JB, Scapozza L, Soldati-Favre D (2018) Modeling and resistant alleles explain the selectivity of antimalarial compound 49c towards apicomplexan aspartyl proteases. EMBO J 37

6. Talele TT, Khedkar SA, Rigby AC (2010) Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr Top Med Chem 10: 127-41

(30)

(31)

(32)

CHAPTER I

STRUCTURE-BASED COMPUTATIONAL METHODS IN DRUG

DISCOVERY: how ‘protein-docking’ translates into the new emerging

field of docking small molecules to nucleic acids?

(33)

(34)

STRUCTURE-BASED COMPUTATIONAL METHODS IN DRUG DISCOVERY:

how ‘protein-docking’ translates into the new emerging field of docking small molecules to nucleic acids?

Francesca Tessaro and Leonardo Scapozza

Pharmaceutical Biochemistry, School of Pharmaceutical Sciences and Institute of Pharmaceutical Sciences of Western Switzerland University of Geneva CMU, Rue Michel-Servet 1, 1211 Geneva 4, Switzerland.

Published in:

Molecules, 2020; 25(12): 2749 DOI: 10.3390/molecules25122749

(35)

Abstract

In this chapter, we retraced the ‘40-year evolution’ of molecular docking algorithms. Over the course of the years, their development allowed to progress from the so-called ‘rigid-docking’

searching methods to the more sophisticated ‘semi-flexible’ and ‘flexible docking’ algorithms.

Together with the advancement of computing architecture and power, molecular docking’s applications also exponentially increased, from a single-ligand binding calculation to large screening and polypharmacology profiles. Recently targeting nucleic acids with small molecules has emerged as a valuable therapeutic strategy especially for cancer treatment, along with bacterial and viral infections. For example, therapeutic intervention at the mRNA level allows to overcome the problematic of undruggable proteins without modifying the genome. Despite the promising therapeutic potential of nucleic acids, molecular docking programs have been optimized mostly for proteins. Here, we have analyzed literature data on nucleic acid to benchmark some of the widely used docking programs. Finally, the comparison between proteins and nucleic acid targets docking highlighted similarity and differences, which are intrinsically related to their chemical and structural nature.

1. Introduction

Nowadays computational methods are routinely applied in drug discovery campaigns and its positive impact in research is recognized in the scientific community. Computer-aided drug discovery (CADD) raised in the years 1970s and its continuous progress, together with the advancement in computer technologies, made its usage an indispensable tool. CADD methods are usually divided into two main groups: ligand-based and structure-based. The first approach builds

(36)

structure-activity models based on ligand structure information, whereas the second method take advantage of the three-dimensional structure of the biomolecular target to investigate ligand binding. Among the structure-based methods, molecular docking is a well-known tool for the identification of ligand-target recognition. In this chapter, we will explore the evolution of molecular docking from a single ligand-protein calculation to more recent and sophisticated applications with a particular emphasis on docking applied to nucleic acids (NA) as emerging therapeutic targets.

2. Nature of biomolecular target for structure-based methods

2.1 Experimentally determined structure

The protein data bank (PDB) (https://www.rcsb.org/) is a repository of experimental macromolecular structures, which is constantly updated. By May 2020, the PDB counted ~163′633 experimental structures of proteins, RNAs, and DNAs belonging to different organisms. The majority of the structures are solved by X-ray crystallography (~145′437) (https://www.rcsb.org/stats/summary) but also nuclear magnetic resonance (NMR) and electron microscopy (EM) techniques strongly contribute in increasing the collection. The introduction of novel protocols and methods in x-ray crystallography enormously ameliorate structure resolution if compared to the first solved myoglobin structure in 1960 (Kendrew, Dickerson et al., 1960).

Nevertheless, we are lacking structural information for many biological macromolecules. Other techniques like NMR are preferred for small systems, particularly suitable for nucleic acids as small RNA or DNA sequences where structure dynamic information might result to be crucial for the investigation of the biological event. However, Cryo-EM technology has evolved from the method for studying relatively large systems as cells organelles or macromolecular complexes but

(37)

with a substantial loss on resolution, into the most recent high resolution Cryo-EM allowing to reach high resolution (better than 3 Å) for structures as small as 64 kDa (Herzik, Wu et al., 2019).

This method also has the advantage of using a smaller amount of material compared to X-ray or NMR techniques.

2.2 Computational methods for structure prediction

The PDB represents the largest source of structural data of biomolecules suitable for structure- based computational tools, and indeed for molecular docking. Despite the increasing number of deposited structures in the PDB, we are far from having structural information for all biological relevant proteins. The number of non-redundant protein sequences is estimated to be 152 million from the National Centre for Biotechnology Information (https://www.ncbi.nlm.nih.gov/refseq/).

This represents an enormous gap between sequence annotation and available 3D structures. In addition, the rate at which new protein sequences are defined far exceeds the rate at which protein structures are experimentally determined. Hence, in absence of experimental structures, computational predictions emerged as a valuable alternative for structure-based studies. These computational methods can be divided into two main categories: ‘template-based’ and ‘free- modelling’. Among them, template-based methods, homology modelling in particular, results in being the most accurate for binding site identification and drug design purposes (Moult, Fidelis et al., 2018). It has been estimated that with the use of homology representation, the human proteome can reach 70% of its coverage at ≥ 30% of sequence identity, which represent around 95% of the human drug targets used for therapeutic intervention (Somody, MacKinnon et al., 2017).

Homology modelling (or comparative modelling) consists in building target models based on sequence similarity with a homologous template, which presents a known 3D structure. The

(38)

general assumption of this methodology is that evolutionary related biomolecules with a similar sequence tend to have also similar structure. Normally, in the case of proteins, sequence identity with the template greater than ~30% is considered the threshold for a model suitable for structure- based studies, whereas for drug design purpose a more stringent 50% identity is required for a reliable model (Hillisch, Pineda et al., 2004). Sequence alignment and model validation are both crucial steps in obtaining a reliable model. Ramachandran plot, for example, geometrically evaluates backbone outliers reporting the torsion angles, ϕ and ψ, which should reside in the allowed region of the graph (Kendrew et al., 1960).

Homology modelling has been used for many applications in structure-based studies from target identification, site-directed mutagenesis, binding site inspection to druggability analysis.

Structural models obtained by homology represents the starting point for other computational methods notably molecular docking and dynamic simulations. One example where homology models were used as a starting source of information is represented by a structure-based investigation of a novel class of aspartic proteases: Toxoplasma gondii Aspartyl protease 3 and Plasmodium Falciparum Plasmepsin IX/X, where with the application of molecular docking we could unravel the atomistic mode of action of a nanomolar inhibitor (49c) (Mukherjee et al., 2018).

(39)

3. “40 years of protein docking highlights and advancements”

After 40 years since the first application of docking, huge advancements have been made both in searching algorithms and in scoring functions, allowing its usage in a broad-range of biomolecular systems. Here, we want to highlights some of the important milestones, which contributed to the history and development of molecular docking theory and applications.

3.1 An evolutionary perspective: from rigid to flexible docking

Along the course of the years, ameliorations of the docking performances mainly focused on balancing the dual aspect of predicted pose correctness vs. speed of the searching algorithm required for the calculation. The development of docking algorithms was also favoured by the introduction of modern computers and parallelization, which determined upscale improvements on the speed of the calculations. Complexity of the algorithms can be classified accordingly to the number of degrees of freedom they neglect (Figure 3).

Figure 3: Evolutionary timeline highlighting key events contributing to the development of molecular docking algorithms, starting from rigid to flexible docking.

(40)

In 1982, DOCK program (Kuntz, Blaney et al., 1982) was introduced and designed to find molecules matching by shape complementarity. Molecular and geometrical shape algorithms also called matching algorithms were firstly applied in rigid-docking calculations: where the two molecules treated as rigid bodies, explore only the six translational and rotational degrees of freedom. In the matching algorithms, the binding site is represented as a collection of spheres of different radii, which allow mapping the ligand to the sphere centres and finding matching sets (Kuntz et al., 1982). Rigid docking has an appreciable speed and shows to be very effective in database exploration (Shoichet, Kuntz et al., 1992). However, one of the major drawbacks of rigid docking is the fact of not considering ligand-flexibility and on relying only on pre-calculated conformations of the ligands. This method refers to the primordial concept of molecular recognition: the ‘lock and key’ model introduced by Fischer (Fischer, 1894). Because of the limited degree of freedom, virtually reducing the complexity of the system, rigid docking was successfully applied also for protein-protein and protein-peptide docking (Stoddard & Koshland, 1992).

Ligand conformational freedom starts to be introduced with ligand-flexible docking methods, also referred as semi-flexible docking. Systematic algorithms, for example, generate all possible ligand conformations in order to find the best ligand-binding match. Fragmentations and incremental construction (IC) methods belong to the stochastic algorithms. With this algorithm each ligand is divided in small fragments and then rigidly docked into the binding site (DesJarlais, Sheridan et al., 1986). Only when the anchor fragment is established the rest of the molecule is joined and energy minimized in the binding site. This method works well with small to medium sized molecules, while with bigger molecules the number of generated fragments increase and thus might create difficulties in pose prediction. On the one hand, a systematic search of the ligand

(41)

conformation is more accurate, on the other, it requires high computational resources and time.

For instance, stochastic algorithms are a good trade-off between speed and chance to obtain the correct binding pose. Among them, Monte Carlo algorithm is a well-known stochastic method: at each iteration the ligand goes through a random bond-rotation, rigid-body translation or rotation.

The binding pose is evaluated based on molecular-mechanic energy calculation and then the standard Metropolis Monte Carlo method based on the Boltzmann constant is applied to accept or reject the pose for the next iteration. A particular advantage of Monte Carlo method is the possibility to explore diverse energetic local minima conformations using higher temperature values, and therefore overcome the problem of the energy barriers (Hart & Read, 1992). Tabu method was introduced few years later with the aim of improving Monte Carlo conformational search while preventing the exploration of already sampled zones.

Genetic algorithms (Jones, Willett et al., 1995, Oshiro, Kuntz et al., 1995) also belong to the stochastic methods and they took inspiration from the biological evolution of the Darwin’s theory.

Each gene represents a degree of freedom of the ligand and thus the chromosomes describe the ligand conformations. As happens in nature, mutations and crossovers occur and define new populations of conformers to which a fitness score is associated. Only the docking pose with good scoring will be selected and moved to the next iterations.

Phenomena such as allosteric modulations or certain kinetics mechanisms cannot be explained with the early ‘lock and key’ theory. Therefore, the generally accepted model regarding the molecular binding event shifted toward a more sophisticated concept of ‘induced-fit’ mechanism, more prone to explain the aforementioned events (Koshland, 1958). Together with these theoretical models, ‘conformational ensemble’ theory was also proposed (Monod, Wyman et al., 1965), supporting the idea that different conformers exists for the same protein and that the ligand

(42)

preferentially binds one of them. In this sense, the ideal molecular docking engine would allow a conformational degree of freedom on both ligand and protein. Therefore, accounting for protein flexibility is fundamental for explaining many biological processes around the ligand-binding event (Teague, 2003). Molecular dynamic simulations in this sense can perfectly describe the dynamic behaviour of a macromolecular system as function of time. However, the major drawback on its application is the computational cost required for analysing in an exhaustive way the energetic landscape of the systems. Indeed, for large systems the energetic barriers that separate the possible binding modes might have results too high to be overcome, which might lead to an inadequate conformational sampling.

Before the advent of molecular dynamic, various methods tried to address the protein flexibility.

For example, ‘soft docking’ (Jiang & Kim, 1991) allows a partial overlap of the ligand-protein atoms by decreasing the van der Waals repulsion energy term employed by the scoring function.

On one hand, this method results in being computationally very efficient and suitable for testing large libraries, on the other hand protein atoms are roughly fixed, approximating small movements.

Just a few year later, Leach (Leach, 1994) introduced rotamer libraries composed by alternative conformations of amino acids side-chains. Here, the use of rotamer libraries speeds up the sampling search and allows avoiding ligand-pose minimization, which makes this approach appealing for its velocity. However, all these strategies considered a single protein conformation.

Thus, large movements with the exception of side chains rotations are omitted. The

‘conformational ensemble’ docking approach recalls the homonymous theory (Monod et al., 1965), where the target exists as a collection of conformers derived by molecular dynamics or Monte Carlo simulations. Based on this theory, the ligand binding event can occur either by recognizing its preferred protein conformation or by a mutual conformational rearrangement

(43)

accommodating the ligand in its binding site as induce-fit effect (Koshland, 1958). An alternative way to consider the ensemble of conformers has been introduced by Knegtel (Knegtel, Kuntz et al., 1997), where an energy and geometry weighted average grid is used for describing the receptor binding site. This approach has been further improved not by using an average grid but by merging the conserved moieties while creating alternative conformations for the remaining parts (Claussen, Buning et al., 2001). Today the awareness in the dynamic behaviour of proteins is widely accepted and embraced by the scientific community. Sampling strategies for an exhaustive search of protein conformations have been explored during the course of the years (Amaro, Baudry et al., 2018, Salmaso & Moro, 2018). Finally, the last decade of molecular docking advancements has seen the powerful integration of machine-learning methods, and even more with deep learning, which hold huge promises for computational drug discovery research (Torres, Sodero et al., 2019).

3.2 Molecular docking as crossing tools for multiple scopes

From its original application molecular docking was conceived as in silico tool for investigating target-ligand interaction especially in support of drug discovery campaigns either in the screening phase, de novo design or ligand optimization. Advancements in either algorithms, scoring functions, computer architectures (GPUs), parallelization, and artificial intelligence contribute to reconfirm the crucial role of molecular docking in science (Chart 1) (Pinzi & Rastelli, 2019).

Indeed, it is now employed and integrated in a variety of discovery tasks not only for large library screening, but also as reverse docking for target profiling in drug repositioning, polypharmacology, and prediction of adverse effects (Pinzi & Rastelli, 2019, Westermaier, Barril et al., 2015), as shown in Chart 1.

(44)

Recent works highlight how large screening protocols involving millions of compounds will positively influence early drug discovery programs and will allow the identification of novel chemotypes belonging to their relative target (Lyu, Wang et al., 2019). The trustworthiness of molecular docking depends also on the accuracy of the used scoring function. Recent reviews have described the strengths and weaknesses of the classical scoring function as well as the more recent advancement in scoring function development (Li, Fu et al., 2019, Wenying, Xuanyi et al., 2020).

Here we would like to highlight that artificial intelligence and statistical analysis emerged as tools to improve binding affinity and scoring functions taking advantage of the ever-growing publicly available databases (Lavecchia, 2015). One of latest applications is covalent docking or ‘reactive docking’, where results are very encouraging despite the challenging task of handling the ligand

Chart 1: The increasing number of molecular docking applications. The dark blue compartment shows the new applications that are summed up with the existing ones.

(45)

bond formation often using classical molecular mechanics force fields (Bianco, Forli et al., 2016, Gupta, Sharma et al., 2018, Kumalo, Bhakat et al., 2015). In this sense, new methodologies and scoring functions need to be developed to ameliorate the scoring accuracy without affecting the computational cost and hindering its application for large screening (Cavasotto, Adler et al., 2018).

Moreover, as mentioned above, one of the major drawbacks of molecular docking is the reduced ability to handle protein flexibility. Even with the implementation of several strategies (Jiang &

Kim, 1991, Knegtel et al., 1997, Leach, 1994) it does not allow extended conformational changes of the protein structure especially at the backbone level. However, to overcome these limitations, docking is often integrated with molecular dynamics and thermodynamic calculations for both exploring the conformational ensemble and for further pose refinement (Amaro et al., 2018, De Vivo, Masetti et al., 2016, Salmaso & Moro, 2018, Totrov & Abagyan, 2008). Finally, molecular docking does not apply only to small molecules recognizing proteins but rather its broad usage allows to dock different types of macromolecules as protein-protein, peptide-protein, and not ultimately nucleic acids.

4. Nucleic acids as emerging therapeutic targets

Just 1.5% of the human genome encodes for proteins, only a 0.05% has been successfully drugged with existing small molecules (Somody et al., 2017). If the latest number is considered, it appears clear that limitations on the chemical coverage within the human genome relies mainly on the difficulty to target the so-called ‘undruggable’ proteins. Disease-associated proteins represents a small fraction of the whole human genome. Therefore, if we consider that nucleic acids can be potentially modulated by small molecules then the therapeutic landscape of macromolecular targets will exponentially increase, as exemplify by the consistent fraction (70%) of the human

(46)

genome transcribed into non-coding RNA (Somody et al., 2017). DNA and RNA play essential role in many biological process and representing an important class of drug targets (Hurley &

Boyd, 1988). They can be distinguished by their nucleotides and sugar composition, which results into a different functional role. Indeed, while DNA is long-term storage of genetic information, RNA is responsible for the transfer of genetic information from the nucleus to the ribosome allowing proteins production. From the structural point of view, DNA is composed by four nucleotide bases: adenosine, guanine, cytosine and thymidine; while RNA presents the uracil base instead of thymidine. Another structural difference between DNA and RNA is the sugar composition: 2-deoxyribose in DNA is replaced by ribose in RNA. Each base is connected through their sugar by a phosphodiester bond, which constitute the backbone of the DNA or RNA strand.

Small molecules interact with nucleic acid using different mechanisms: intercalation, cross- linkage, strand-cleavage and reading-molecules. In particular, DNA-binders can interfere with the replication process, which consequently affects the transcription phase, gene expression regulation and cell proliferation. Similarly, RNA-binders might interfere during either the transcription and translation process, which could affect the splicing machinery. Consequently, they appear as suitable targets for various diseases, not only for the chemotherapeutical area including antivirals and antibacterial (Wang, Yu et al., 2016) but they can also be exploited in genetic diseases.

4.1. DNA-targeting for cancer and antimicrobial therapy

DNA is considered as one of the main molecular targets in chemotherapy (Ali & Bhattacharya, 2014). The discovery of first alkylating agents go back to the chemical warfare during the World War II which spawn the modern era of cancer therapy (Kohn, 1996). Unfortunately, its non- specific interaction with the DNA cause off-target effects and high toxicity, partially limiting its

(47)

use in clinic today. Also, the majority of first DNA-binders were causing double strands break which were beneficial for blocking the tumor progression, but a as drawback, this damage could lead to novel mutations and cause secondary tumors, which would appear 10–15 years later.

Indeed, much more effort has been put into the development of DNA-binders without genotoxic effects. This resulted in molecules classified as groove binders, intercalators, and covalent binders.

One notable example of DNA target is the oncogene c-Myc, which is over-activated in many types of cancer, the stabilization of its G-quadruplex structure inhibits the telomerase activity (Dutta, Debnath et al., 2018) (Figure 4A). The blockage of cell proliferation has raised interest also as therapeutic strategy for anti-bacterial treatment. Today the potential of using DNA as target to overcome bacterial resistance is underestimated; different strategies to increase the therapeutic index are used as the increase of the selective uptake of the drug by the microbes or designing compounds with sequence selectivity (Bolhuis & Aldrich-Wright, 2014). Examples of DNA- binders small molecules for cancer and anti-infective therapy are reviewed in other works (Ali &

Bhattacharya, 2014, Barrett, Gemmell et al., 2013, Bolhuis & Aldrich-Wright, 2014, Cimino- Reale, Zaffaroni et al., 2016, Hurley, 2002, Kohn, 1996, Wilson, Tanious et al., 2008).

(48)

Figure 4: Examples of nucleic acid targets.

A. Structure of the G-quadruplex oncogene c-Myc shown with transparent surface and backbone as cartoon style (PDB code: 6AU4).

B. Example of riboswitch from Glutamine II Riboswitch (PDB code: 6QN3), represented with transparent surface and the backbone as cartoon style.

C. Representation of the HIV trans-activation response (TAR) RNA bound to a high affinity macrocycle (PDB code: 6D2U). The electrostatic surface highlights the charge distribution, whereas the ligand bound is shown in licorice (green).

D. Structure of TSL2 hairpin bound with the small molecule splicing modulator PK4C9 (green). The RNA structure is shown in grey cartoon style and the ligand as licorice.

(49)

4.2 RNA as antiviral and antibacterial target

The transcription and translation process can be regulated also at the RNA level and its targeting has appeared a suitable approach for anti-bacterial and anti-viral therapy. Riboswitches are regulatory part of the mRNA, which control gene expression and consequently protein biosynthesis (Figure 4B). Natural ligands bind to riboswitches and modulate protein gene products by upregulating or downregulating the translation process or by altering mRNA stability. The therapeutic potential of this regulatory mRNA has been widely explored as anti-bacterial target especially because they show high selectivity for their bound-ligand (McCown, Corbino et al., 2017). The ribosomal RNA (rRNA) is another validated target for anti-bacterial therapy which is directly involved in the translating process. It is composed by two subunits: 30S is responsible of the correct position and reading of the tRNA and subunit 50S executing the peptide linkage.

Aminoglycosides are one known class of antibiotic inducing mistranslation of the polypeptide chain and thus interfering with in protein biosynthesis (Krause, Serio et al., 2016). RNA-targeting has also raised interest. HIV trans-activation response (TAR) RNA is considered as a valid target for HIV infection and can bind macrocycle molecules that have anti-HIV activity (Figure 4C).

4.3 mRNA triggering splicing machinery

The splicing event consists in introns removal from the pre-mRNA sequence generating the mature form of the mRNA, which will then constitute the template for protein translation. The spliceosome machinery is composed by multiple proteins that specifically recognizing RNA motifs. A sophisticated mechanism of interactions among the different constituents precisely rules this process. Indeed, any mutations or errors during this transformation might have as consequence a malfunction that is directly translated at a protein level (i.e. truncated proteins, misfolding, over

(50)

or down expressions). Targeting splicing is an astute intervention because it occurs at the early stage of the gene expression without altering the genome (Havens, Duelli et al., 2013). Many strategies are now developing for splicing modifications, which are used to increase specific alternative spliced isoforms or to correct aberrant gene expression caused by gene mutation altering the splicing (Havens et al., 2013). For example, spinal muscular atrophy (SMA), a life- threatening neurodegenerative disease, is the most common genetic cause of infant mortality.

Mutations at the survival motor neuron 1 (SMN1) gene directly affect the production levels of the SMN protein. A second copy of the gene, SMN2, exists, but because of a different alternative splicing pattern, it generates low SMN protein levels, which correlate with the severity of the disease. It has been demonstrated that triggering the splicing of the SMN2 gene determines exon7 (E7) inclusion and restores the lacking levels of SMN protein. Thus, this is a valid strategy for SMA treatment (Campagne, Boigner et al., 2019, Marquis, Meyer et al., 2007, Naryshkin, Weetall et al., 2014). In chapter III, we describe the first target-based study of small molecule splicing modifiers binding the RNA terminal stem loop 2 (TSL2) (Garcia-Lopez, Tessaro et al., 2018a).

PK4C9 was identified as a promising hit TSL2-binder, able to modify the splicing and restore SMN protein levels in SMA cells (Figure 2D) (Garcia-Lopez et al., 2018a).

5. Current ‘Protein Docking Algorithms’ Applied to Nucleic Acids: Challenges, Solutions, and Pitfalls

Since the time of the development of molecular docking, the majority of the therapeutic targets investigated were proteins, the refinement of the docking programs was focusing mainly on these.

A number of docking programs were modified and retrained in order to be applied also to nucleic acids (NA), whereas others were newly designed. Huge advances have been made to ameliorate

Multiple Applications of Structure-based Methods in Drug Discovery

Thesis

Reference

Multiple Applications of Structure-based Methods in Drug Discovery

Multiple applications of structure-based methods in drug discovery

THÈSE

To Chiara

Table of contents

Abbreviations

Summary

Résumé de la thèse

PREFACE

Computational methods in drug discovery research

0.1 Ligand-based (LB) methods

0.2 Structure-based (SB) methods

0.2.1 Molecular docking

0.2.2 Molecular dynamic simulations

0.3 Workflow and aim of the work

0.4 Another example of drug discovery case study using SB methods

0.5 References

CHAPTER I

STRUCTURE-BASED COMPUTATIONAL METHODS IN DRUG

DISCOVERY: how ‘protein-docking’ translates into the new emerging

field of docking small molecules to nucleic acids?

STRUCTURE-BASED COMPUTATIONAL METHODS IN DRUG DISCOVERY:

how ‘protein-docking’ translates into the new emerging field of docking small molecules to nucleic acids?

Abstract

1. Introduction

2. Nature of biomolecular target for structure-based methods

2.1 Experimentally determined structure

2.2 Computational methods for structure prediction

3. “40 years of protein docking highlights and advancements”

3.1 An evolutionary perspective: from rigid to flexible docking

3.2 Molecular docking as crossing tools for multiple scopes

4. Nucleic acids as emerging therapeutic targets

4.1. DNA-targeting for cancer and antimicrobial therapy

4.2 RNA as antiviral and antibacterial target

4.3 mRNA triggering splicing machinery

5. Current ‘Protein Docking Algorithms’ Applied to Nucleic Acids: Challenges, Solutions, and Pitfalls