• Aucun résultat trouvé

Knowledge discovery in pharmaceutical sciences, from metabolomics to molecular modelling

N/A
N/A
Protected

Academic year: 2022

Partager "Knowledge discovery in pharmaceutical sciences, from metabolomics to molecular modelling"

Copied!
237
0
0

Texte intégral

(1)

Thesis

Reference

Knowledge discovery in pharmaceutical sciences, from metabolomics to molecular modelling

BOCCARD, Julien

Abstract

Avec l'émergence de nouveaux domaines de recherche bénéficiant de technologies innovantes, la quantité de données à disposition permettant de caractériser un phénomène d'intérêt est en augmentation constante. Alors que l'expérimentateur était habitué à ne produire qu'une quantité restreinte de données qu'il pouvait analyser à la main, l'évolution technologique des instruments de laboratoire et de l'informatique a conduit à un changement fondamental dans la façon de concevoir l'analyse de données. A l'heure où les méthodologies de recherche modernes produisent ces données de manière exponentielle, il n'est plus concevable de les exploiter avec une approche traditionnelle. Si la supervision humaine reste un pré-requis fondamental, le traitement et l'extraction de l'information pertinente tendent en effet à s'automatiser. D'autre part, les outils statistiques classiques ne sont pas adaptés aux données de grande dimensionnalité qui caractérisent aujourd'hui ces approches. Dans cette perspective, des méthodes d'analyse multivariée ont fait leur apparition de manière à palier à ces nouveaux [...]

BOCCARD, Julien. Knowledge discovery in pharmaceutical sciences, from

metabolomics to molecular modelling. Thèse de doctorat : Univ. Genève, 2009, no. Sc.

4091

URN : urn:nbn:ch:unige-24440

DOI : 10.13097/archive-ouverte/unige:2444

Available at:

http://archive-ouverte.unige.ch/unige:2444

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Knowledge Discovery In Pharmaceutical Sciences,

From Metabolomics To Molecular Modelling

THÈSE

présentée à la Faculté des Sciences de l’Université de Genève pour obtenir le grade de Docteur ès Sciences, mention interdisciplinaire

par

Julien Boccard

de Genève (GE)

Thèse n° 4091

Genève Atelier ReproMail

2009

(3)
(4)

J’aimerais exprimer ma plus sincère reconnaissance au Docteur Serge Rudaz pour m’avoir accordé sa confiance et permis d’effectuer ce travail de thèse au sein du Laboratoire de Chimie Analytique Pharmaceutique. Merci pour un soutien, un dynamisme et un enthousiasme de tous les instants. Merci également pour m’avoir enseigné une superbe conception de la recherche scientifique et pour les précieux conseils distillés pendant nos nombreux échanges d’idées, que ce soit dans le domaine analytique ou au niveau des statistiques.

Je tiens également à remercier tout particulièrement le Professeur Pierre-Alain Carrupt pour m’avoir accueilli au sein du groupe de Pharmacochimie et m’avoir permis de profiter d’un environnement de travail très motivant, autant au niveau des relations humaines que des ressources exceptionnelles mises à disposition. Merci également pour tous les apports scientifiques de premier plan concernant la modélisation moléculaire et le traitement de données, la grande liberté de travail dont j’ai bénéficié et une sérénité jamais démentie.

J’aimerais aussi exprimer ma profonde gratitude au Docteur Elia Grata et à Gaétan Glauser avec qui j’ai collaboré étroitement dans le cadre du projet de métabolomique végétale. Merci pour la motivation, la décontraction et l’amitié partagée. Les résultats présentés dans cette thèse sont avant tout le fruit d’une très belle aventure collective.

J’aimerais également remercier mes collègues le Docteur Antoine Daina, le Docteur Sophie Martel, Juan Bravo et Elisabeth Favre avec qui j’ai partagé autant de discussions scientifiques très enrichissantes que de moments de bonne humeur et d’amitié.

J’adresse ma reconnaissance au Professeur Jean-Luc Wolfender qui a œuvré à la mise en place de la plateforme métabolomique et par qui j’ai amorcé mon premier contact avec l’école de pharmacie. Merci pour une créativité débordante, un enthousiasme communicatif et une énergie inépuisable.

(5)

sein du Laboratoire de Chimie Analytique Pharmaceutique.

Merci aux membres du jury de thèse, le Docteur Sonia Poli, le Docteur Mohamed Hanafi et le Docteur Laurent Geiser pour leur lecture attentive, leur regard critique et leurs précieuses recommandations.

J’aimerais exprimer mes remerciements à tous ceux qui ont contribué aux différents aspects de cette thèse, par leurs conseils avisés et leur expérience, que ce soit dans le domaine de la chimiométrie en ce qui concerne le Professeur Pierre Lantéri et Jean-Yves Gauvrit de l’université Claude Bernard de Lyon ainsi que Gérard Mazerolles de l’INRA de Montpellier; l’inhibition des transporteurs ABC pour le Docteur Edwige Nicolle et le Professeur Ahcène Boumendjel de l’université Joseph Fourier de Grenoble ou la fouille de données avec le Professeur Mélanie Hilario et le Docteur Alexandros Kalousis du Laboratoire d’Intelligence Artificielle de l’université de Genève.

Un grand merci à tous mes collègues côtoyés au sein de la section des Sciences Pharmaceutiques pour m’avoir permis de découvrir le petit monde de la pharmacie et pour m’avoir accompagné durant ces années de thèse.

Je tiens aussi à remercier ma famille et mes amis pour leurs encouragements et leur soutien continuel.

Et finalement j’adresse des remerciements particuliers à Suzanne sans qui ma vie ne serait pas ce qu’elle est.

(6)

Table of contents

Avant-p ... V Résumé ... ... VII Articles .. ... XI

Synopsi

Extracti .... ...3

PART A UPLC-T Abbrevi ...9

1 INTRO ...11

1.1 M ...11

1.1.1 Sys ... ...11

1.1 App ... ...14

1.1 Ara 1.2. A ...17

1.2.1 Gen ... ...17

1.2 Sam ... ...18

1.2.3 Ma ... ...19

1.2 Dire ...21

1.2.5 Chr ... ...21

1.2.5.1 c ...22

1.2.5.2 id 1.2 Nuc ...23

1.3 M ...25

1.3.1 Dat ...25

1.3 Dat ...26

1.3.2.1 ata f ... ...26

1.3.2.2 entro ... ...27

1 2.3 esolu ...27

1.3.2.4 rin ...28

1.3.2.5 eak d ... ...28

1.3.2.6 ... ...29

1.3.2.7 lignm .. ...30

1.3.2.8 ... ...32

ropos ... de la thèse ... ... et présentations... ... s ng knowledge from highly multivariate data ... .... OF/MS metabolic fingerprinting in Arabidopsis thaliana...7

ations... DUCTION... etabolomics ... tems biology... ... .2 lications of metabolomics ... ... .3 bidopsis thaliana, a model for plant metabolomics ...15

nalytical methods ... eral considerations ... ... .2 ple preparation & extraction ... ... ss spectrometry... ... .4 ct injection mass spectrometry ... omatography-mass spectrometry ... ... Gas hromatography-mass spectrometry ... Liqu chromatography-mass spectrometry ...22 .6 lear magnetic resonance spectroscopy...

etabolomic data handling ...

a handling ...

.2 a pre-processing...

D iles ... ...

C id data... ...

.3. R tion & data binning ...

Filte g data ...

P etection ... ...

Deconvolution ... ...

A ent ... ...

Normalisation ... ...

(7)

1.3.2.9 oftwa ...33

1.3 Me ... ...33

1.4 Data analysis...36

1.4 Han ... ...36

1.4 Mo ...36

1.4 Uni ...37

1.4 Exp .... ...38

1.4.4.1 rincip ... ...38

1.4.4.2 ierar ... ...40

1.4.5 Cla ... ...42

1 5.1 artia .4.5.2 Decision trees ...44

1.4.5.3 rtifici ... ...46

1.4.5.3.1 Pe .. ...46

...48

1.4.5.4 l lea ...48

1.4.5.4.1 Pr ...48

1 1.4.6 Variable s . ...52

1.4.6.1 Defying the curse of dimensionality ...52

1.4.6.2 ori ... ...53

1.4.6.3 nalys ...54

1.4.6.4 rm ...55

1.4.6.5 .... ...55

1.4.6.6 e selection ...56

1.4.6.7 VM-b ... ...56

1.4.6.8 rthog ... ...56

1.4.7 Alg ithm ... ...57

1 7.1 . ...57

1.4.7.2 lassi ...59

1.5 Bibliogr ... ...60

2 PUBLISHED WORKS...75

2.1 A M a analysis of rapid LC-TOF/MS experiments from A id sis th S re packages ... .3 tabolomic data pre-treatment ... ... .1 dling the metabolome complexity ... ... .2 delling metabolomic data... .3 variate hypothesis testing ... .4 loratory analysis by unsupervised learning ... ... P al components analysis... ... H chical cluster analysis ... ... ssification by supervised learning ... ... .4. P l least squares projection to latent structures ...42

1 A al neural networks... ... rceptrons ... ... 1.4.5.3.2 Radial basis function networks ... Statistica rning ... obabilistic learning algorithms... 1.4.5.4.2 Instance-based learning algorithms ...49

.4.5.5 Support vector machines ...50

election ... ... Alg thmic aspects ... ... A is of variance ... Info ation gain ... Relief ... ... Correlation-based featur S ased feature selection ... ... O onal signal correction ... ... or evaluation ... ... .4. Cross validation... ... C fication algorithms comparison .. aphic references ... ... rticle I ultivariate dat rab op aliana stressed by wounding...77

(8)

2.2 Article I

Development of a two-step screening ESI-TOF-MS method for rapid determination of significant stress-induced

metabolome modifications in plant leaf extracts:

The wound response in Arabidopsis thaliana as a case study...87

2.3 Article III UPLC–TOF-MS for plant metabolomics: A sequential approach for wound marker analysis in Arabidopsis thaliana...99

2.4 Article IV Machine learning applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana...109

PART B 3D linear solvation energy analysis to inhibit ABC transporters efflux with flavonoid derivatives...133

Abbreviations...135

1 INTRODUCTION...137

1.1 Multidrug resistance ...137

1.1.1 Chemotherapy and multidrug resistance...137

1.1.2 ABC transporters ...138

1.1.3 P-glycoprotein...140

1.1.4 Breast cancer resistance protein ...142

1.1.5 Reversal of drug resistance...143

1.1.5.1 General considerations...143

1.1.5.2 Flavonoids as ABC transporters inhibitors ...144

1.2 Modelling the interactions between flavonoid derivatives and ABC transporters...146

1.2.1 Chemoinformatics ...146

1.2.2 Quantitative structure-activity relationship ...147

1.2.2.1 The pharmacophore concept ...147

1.2.2.2 Molecular descriptors ...148

1.2.2.3 Molecular interaction fields ...149

1.2.3 From LSER to solvatochromic MIFs ...150

1.2.3.1 Linear solvation energy relationship...150

1.2.3.2 Theoretical LSER molecular descriptors ...151

1.2.3.2.1 GRID – DRY ...152 I

(9)

ecular lipophilicity potential...153 1.2.3.2.3 M

1.2 Exp ring ...155

1 4.1 D qu ...155

1.2.4.2

.3 VolSurf ...157 1.2 thods...159

1.2.5.1 .159

1 .2 way

Block scaling ...160 1 lidation ...161

1.3 QSAR s dies .163

1.3.1 General co 163

1.3.2 ode ...163 1.4 B c references ...165

2 PUBL ...175

2.1 A

A 3D linear solvation energy model to quantify the affinity

o ward P-glycoprotein...177 2.2 A

B ast

N in es by a 3D linear

s ati ...189

Conclus an ...201

Appendix 203

1. Article VI O

for the isolatio

t id cl agnetic resonance...205 2. Article VI

Synergy at the M hod logy

metabolomic ...215

1.2.3.2.2 Mol

olecular hydrogen-bonding potentials ...154 .4 lo molecular interaction fields ...

.2. 3 antitative structure-activity relationship ...

Comparative molecular field analysis ...156 1.2.4

.5 Statistical me

Chemometrics ...

.2.5 N- partial least squares ...159 1.2.5.3

.2.5.4 Model va

tu of ABC transporters ...

nsiderations ...

Pgp m ls...

1.3.3 BCRP models ...163 ibliographi

ISHED WORKS...

rticle V

f flavonoid derivatives to rticle VI

re cancer resistance protein (BCRP/ABCG2):

ew hibitors and QSAR studi olv on energy approach...

ion d perspectives ...

...

I

ptimized liquid chromatography–mass spectrometry approach n of minor stress biomarkers in plant extracts and heir entification by capillary nu ear m

II

'Ecole de Pharmacie Geneve-Lausanne':

et o developments for the treatment of complex datasets with data mining...

(10)

Avant-propos

La présente thèse a été effectuée conjointement au sein du Laboratoire de Chimie Analytique Pharmaceutique (LCAP) et du groupe de Pharmacochimie du Laboratoire de Chimie Thérapeutique (LCT-GP) sous la co-direction du Docteur Serge Rudaz et du Professeur Pierre-Alain Carrupt. Le travail de thèse présenté dans ce manuscrit se divise en deux parties distinctes qui disposent séparément d’une introduction théorique et des articles scientifiques publiés dans le cadre des différents travaux de recherche.

La première partie de ce travail s’inscrit dans le cadre du déploiement d’une plateforme métabolomique qui regroupe les deux laboratoires précités et le Laboratoire de Pharmacognosie et Phytochimie (LPP). Cette plateforme associe des compétences analytiques dans le domaine de la chromatographie liquide couplée à la spectrométrie de masse et de la résonance magnétique nucléaire ainsi que l’application de méthodes d’analyse de données multivariées qui constituent le sujet de ce travail de thèse.

Le projet a été initié par les travaux du Docteur Aly Thiocone consacrés au criblage LC-MS différentiel de composés impliqués dans les mécanismes de défense chez Arabidopsis thaliana (Thèse N°3793 de l’Université de Genève, 2006). Le Docteur Elia Grata a poursuivi ces premiers travaux et développé une approche métabolomique basée sur la chromatographie liquide à ultra-haute pression (UHPLC) couplée à la spectrométrie de masse à temps de vol (TOF- MS) (Thèse N°4059 de l’Université de Genève, 2009) dans le cadre de la découverte de biomarqueurs liés à la réponse à la blessure mécanique chez Arabidopsis thaliana. Ce travail repose sur une stratégie analytique séquentielle proposant une première étape de criblage à haut débit suivie par une étape de profilage métabolique à haute résolution. La première partie de ce manuscrit regroupe les travaux concernant l’application d’outils d’analyse multivariée dans le cadre du criblage. Certains des composés candidats issus de cette méthodologie sont ensuite isolés et caractérisés par résonance magnétique nucléaire dans le cadre des travaux réalisés par M. Gaétan Glauser (thèse en cours, Université de Genève).

(11)

La seconde partie de ce travail fait suite à la thèse du Docteur Fania Bajot qui traite de la caractérisation du profil pharmacocinétique de candidats médicaments par analyse solvatochromique 3D (Thèse N°3813 de l’Université de Genève, 2006). L’approche initiée dans ces travaux est appliquée ici au développement de chimiosensibilisateurs dans le cadre de la résistance multiple aux médicaments.

La deuxième partie du présent manuscrit propose en effet l’étude et la caractérisation des forces d’interactions moléculaires qui dirigent la capacité d’inhibition de dérivés de flavonoïdes envers les transporteurs ABC. Deux études concernant des cibles fondamentales dans le cadre de la chimiorésistance sont proposées et concernent respectivement la glycoprotéine P et la « Breast Cancer Resistance Protein ». Les différentes approches appliquées au traitement de données associent plusieurs méthodologies pour obtenir des informations complémentaires et ainsi proposer une meilleure évaluation du phénomène.

(12)

Résumé du travail de thèse

Avec l’émergence de nouveaux domaines de recherche bénéficiant de technologies innovantes, la quantité de données à disposition permettant de caractériser un phénomène d’intérêt est en augmentation constante. Alors que l’expérimentateur était habitué à ne produire qu’une quantité restreinte de données qu’il pouvait analyser à la main, l’évolution technologique des instruments de laboratoire et de l’informatique a conduit à un changement fondamental dans la façon de concevoir l’analyse de données. A l’heure où les méthodologies de recherche modernes produisent ces données de manière exponentielle, il n’est plus concevable de les exploiter avec une approche traditionnelle. Si la supervision humaine reste un pré-requis fondamental, le traitement et l’extraction de l’information pertinente tendent en effet à s’automatiser. D’autre part, les outils statistiques classiques ne sont pas adaptés aux données de grande dimensionnalité qui caractérisent aujourd’hui ces approches. Dans cette perspective, des méthodes d’analyse multivariée ont fait leur apparition de manière à palier à ces nouveaux besoins.

Ces nouvelles façons d’appréhender des données souvent massives et complexes sont réunies sous les termes de chimiométrie, issue de la chimie analytique, ou de fouille de données liée à l’informatique, l’intelligence artificielle et les bases de données. L’idée centrale qui se dégage de ces disciplines réside dans la transformation de la quantité croissante de données à disposition en information pertinente qui puisse finalement engendrer une connaissance.

Ce travail de thèse propose deux exemples dans cette direction avec une première application concernant une étude métabolomique de la réponse à la blessure chez la plante modèle Arabidopsis thaliana et une seconde concernant l’étude des propriétés physicochimiques de dérivés de flavonoïdes qui gouvernent leur capacité d’inhiber l’activité de transporteurs ABC responsables du phénomène de chimiorésistance.

(13)

Partie A : Analyse de données métabolomiques chez A. thaliana

La réponse à la blessure chez les plantes est le résultat de transductions complexes de signaux et de métabolites spécifiques. Si certaines molécules impliquées dans les phénomènes de signalisation de défense ont pu être mises en évidence par des méthodes analytiques ciblées, comme par exemple l’acide salicylique ou l’acide jasmonique, l’induction de l’expression des gènes de défense n’est toutefois pas explicable uniquement sur la base des médiateurs chimiques connus. De nouveaux signaux moléculaires impliqués dans ce phénomène restent donc à découvrir.

Le développement de méthodes de criblage rapide pour la classification d’échantillons et la détection de biomarqueurs constitue un champ de recherche important en métabolomique. De récentes avancées dans les technologies analytiques permettent aujourd’hui d’obtenir rapidement des données d’une grande qualité. C’est particulièrement le cas de la chromatographie liquide couplée à la spectrométrie de masse (LC-MS) qui constitue un des outils clés pour l’analyse et l’identification de médiateurs chimiques. La séparation chromatographique préalable permet d’exploiter une large gamme de polarité de composés qui peuvent être évalués en parallèle dans une même analyse. La détection et la mise en évidence de changements de concentration significatifs chez les organismes complexes comme les plantes restent difficiles car il faut distinguer ces métabolites au milieu de l’immense masse d’informations générée par des mesures non-ciblées.

Les quatre premiers articles de ce manuscrit décrivent une étude métabolomique d’Arabidopsis thaliana après induction de stress par blessure mécanique. L’approche proposée comprend le prétraitement des données, leur représentation et leur analyse afin d’évaluer rapidement et de la manière la plus complète possible les modifications métaboliques chez cette plante modèle.

Les deux premières publications (articles I et II) concernent l’utilisation d’une technologie d’analyse rapide par chromatographie liquide à haute performance (HPLC) en phase inverse couplée à un spectromètre de masse à temps de vol (LC-ToF). Une méthode de criblage basée sur la comparaison des spectres de masse totaux omettant l’information du temps de rétention chromatographique est mise en place. Le traitement des données combine l’analyse de variance (ANOVA), la correction orthogonale de signal (OSC), l’analyse discriminante par régression des moindres carrés partiels (PLS-DA), l’analyse en composantes principales (ACP) et la classification ascendante hiérarchique (CAH). La validité de l’approche est illustrée par l‘ajout de composés marqueurs dans une partie des échantillons. L’étape de criblage

(14)

permet de sélectionner des individus de phénotypes homogènes afin de procéder à des analyses approfondies de profilage sur gradient chromatographique conventionnel avec une quantité importante de matériel végétal. Les métabolites potentiellement induits par la signalisation sont également mis en évidence et constituent les cibles spécifiques de l’étape subséquente de profilage.

Les articles III et IV illustrent l’adaptation et l’optimisation de l’approche en deux étapes proposée ci-dessus à la chromatographie liquide à ultra haute performance (UPLC) couplée à un spectromètre de masse à temps de vol (UPLC-ToF/MS). Contrairement aux deux articles précédents, les données ainsi obtenues ne sont plus compressées en spectres de masses totaux mais sont prétraitées à l’aide du logiciel commercial MarkerLynx®. Chaque pic est ainsi caractérisé par un rapport m/z associé à un temps de rétention donné. La réponse à la blessure est analysée dans des conditions plus larges avec la comparaison de feuilles locales et distales, récoltées après plusieurs temps de réponse (90min, 3H, 6H et 24H).

Dans un premier temps, les outils d’analyse multivariée décrits dans les deux premiers articles sont appliqués aux nouvelles données obtenues. Des méthodes d’apprentissage automatique sont également proposées pour l’analyse approfondie des interactions entre les métabolites et leurs contributions respectives à la séparation des différents groupes expérimentaux.

Plusieurs algorithmes parmi les plus connus sont utilisés dans un objectif de comparaison de performances. Les méthodes proposées incluent des méthodes de classification ainsi que des algorithmes de sélection de variables. Ce travail illustre l’utilité de l’application d’algorithmes de classification issus de l’apprentissage automatique pour la construction de modèles prédictifs. La sélection de variable préalable à la classification est bénéfique à la plupart des algorithmes et permet de construire des modèles plus simples et plus facilement interprétables.

Un transfert géométrique permet ensuite de localiser précisément les métabolites d’intérêt grâce à une chromatographie à haute capacité de pics conservant la sélectivité. Cette étape permet l’analyse confirmatoire ciblée des candidats issus du criblage dans le cadre d’un profilage métabolique. La structure des composés présentant un intérêt particulier peut ensuite être élucidée par résonance magnétique nucléaire (RMN) comme illustré dans l’article VII de l’appendice. Finalement, l’article VIII présente dans sa globalité le projet de plateforme métabolomique développé au sein de la section des Sciences Pharmaceutiques.

(15)

Partie B - Analyse 3D de forces d’interactions moléculaires pour la caractérisation du potentiel inhibiteur de dérivés de flavonoïdes envers les transporteurs ABC

Le phénomène de résistance multiple aux médicaments qui peut survenir lors d’une chimiothérapie anticancéreuse est principalement lié à la surexpression de la famille des transporteurs ABC. Ces protéines d’efflux situées dans la membrane plasmique jouent un rôle naturel de détoxification.

Celles-ci sont en effet capables d’expulser une grande variété de xénobiotiques et les mécanismes de résistance dépendants des transporteurs ABC ont donc des implications thérapeutiques importantes. Une des stratégies d’amélioration des traitements est la chimiosensibilisation qui consiste à rétablir l’accumulation cytoplasmique de composés médicamenteux par l’utilisation d’inhibiteurs de ces transporteurs. Les flavonoïdes sont des composés naturels qui constituent des candidats de choix dans cette optique.

Dans le cadre de la recherche de nouveaux inhibiteurs dérivés de flavonoïdes, une approche solvatochromique est proposée pour mener une analyse de relation structure-activité quantitative tridimensionnelle (3D QSAR).

Cette méthodologie engage des propriétés pharmacodynamiques liées aux forces d’interactions hydrophobes, à la polarisabilité et aux potentiels accepteur et donneur de liaisons hydrogène. La compréhension des propriétés moléculaires liées à la capacité d’inhibition de composés actifs a pour objectif l’identification et la synthèse de composés chimiosensibilisateurs plus efficaces et spécifiques. La caractérisation de ces propriétés est établie à l’aide de champs d’interactions moléculaires qui permettent d’obtenir une description précise des forces d’interactions dans l’espace tridimensionnel pour chacun des composés. L’ensemble de données ainsi généré possède cependant une structure complexe qui n’est pas compatible avec les outils d’analyse de données traditionnels. Plusieurs approches sont proposées pour traiter la grande quantité d’information disponible et extraire les éléments pertinents à la description du phénomène d’inhibition des transporteurs ABC par les flavonoïdes.

Deux études proposent l’application de cette méthodologie à la glycoprotéine P et à la « Breast Cancer Resistance Protein ». Ces travaux illustrent l’utilité de l’analyse solvatochromique 3D, originellement liée à la description de propriétés pharmacocinétiques pour l’investigation d’un phénomène pharmacodynamique.

(16)

Articles et présentations

Ce travail de thèse a fait l’objet de plusieurs communications scientifiques, à l’occasion de publications d’articles dans des revues à comité de lecture ainsi que de communications orales ou par affiche dans le cadre de congrès nationaux et internationaux.

Publications scientifiques

I. Multivariate data analysis of rapid LC-TOF/MS experiments from Arabidopsis thaliana stressed by wounding. Boccard, Julien; Grata, Elia; Thiocone, Aly; Gauvrit, Jean-Yves; Lanteri, Pierre; Carrupt, Pierre- Alain; Wolfender, Jean-Luc; Rudaz, Serge. Chemometrics and Intelligent Laboratory Systems (2007), 86(2), 189-197.

II. Development of a two-step screening ESI-TOF-MS method for rapid

determination of significant stress-induced metabolome modifications in plant leaf extracts: the wound response in Arabidopsis thaliana as a case study. Grata, Elia; Boccard, Julien;

Glauser, Gaetan; Carrupt, Pierre-Alain; Farmer, Edward E.; Wolfender, Jean-Luc; Rudaz, Serge. Journal of Separation Science (2007), 30(14), 2268-2278.

III. UPLC-TOF-MS for plant metabolomics: A sequential approach for

wound marker analysis in Arabidopsis thaliana. Grata, Elia; Boccard, Julien; Guillarme, Davy; Glauser, Gaetan; Carrupt, Pierre-Alain; Farmer, Edward; Wolfender, Jean-Luc; Rudaz, Serge. Journal of Chromatography, B: Analytical Technologies in the Biomedical and Life Sciences (2008), 871(2), 261-270.

IV. Machine learning applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana.

Boccard, Julien; Grata, Elia; Glauser, Gaetan; Lantéri, Pierre; Kalousis, Alexandros; Hilario, Melanie; Carrupt, Pierre-Alain; Wolfender, Jean-Luc;

Rudaz Serge. Submitted, Metabolomics 2009.

V. A 3D linear solvation energy model to quantify the affinity of flavonoid derivatives toward P-glycoprotein. Boccard, Julien; Bajot, Fania; Di Pietro, Attilio; Rudaz, Serge; Boumendjel, Ahcene; Nicolle, Edwige; Carrupt, Pierre-Alain. European Journal of Pharmaceutical Sciences (2009), 36(2-3).

(17)

VI. Breast Cancer Resistance Protein (BCRP, ABCG2): New Inhibitors and QSAR Studies by 3D Solvatochromic Analysis. Nicolle, Edwige;

Boccard, Julien; Guilet, David; Dijoux-Franca, Marie-Geneviève; Zelefac, Fabien; Macalou, Sira; Grosselin, Jeanne; Schmidt, Julien; Carrupt, Pierre-Alain; Di Pietro, Attilio; Boumendjel Ahcène. In Press, Accepted Manuscript, European Journal of Pharmaceutical Sciences (2009).

VII. Optimized liquid chromatography-mass spectrometry approach for the isolation of minor stress biomarkers in plant extracts and their identification by capillary nuclear magnetic resonance. Glauser, Gaetan; Guillarme, Davy; Grata, Elia; Boccard, Julien; Thiocone, Aly;

Carrupt, Pierre-Alain; Veuthey, Jean-Luc; Rudaz, Serge; Wolfender, Jean-Luc. Journal of Chromatography, A (2008), 1180(1-2), 90-98.

VIII. Synergy at the 'Ecole de Pharmacie Geneve-Lausanne':

Methodology developments for the treatment of complex metabolomic datasets with data mining. Thiocone, Aly; Grata, Elia;

Boccard, Julien; Carrupt, Pierre-Alain; Rudaz, Serge; Wolfender, Jean- Luc. Chimia (2005), 59(6), 362-365.

IX. Plant metabolomics - strategies for biomarker detection, isolation, and identification. Grata, Elia; Boccard, Julien; Glauser, Gaetan;

Guillarme, Davy; Carrupt, Pierre-Alain; Wolfender, Jean-Luc; Rudaz, Serge. Chimia (2008), 62(7-8), 685.

X. UPLC-TOF-MS for Plant Metabolomics: A Powerful Tool for Discriminating Stress States in Plants and Detecting Low-Level Wound Biomarkers. Glauser, Gaetan; Grata, Elia; Boccard, Julien;

Marsden-Edwards, Emma; Major, Hillary; Carrupt, Pierre-Alain; Rudaz, Serge; Wolfender, Jean-Luc. American Laboratory On-line Edition, June 2008.

(18)

Seules les conférences et affiches présentées en temps que premier auteur ou co-auteur sont présentées ci-dessous.

Présentations orales

I. Statistical Comparison of LC-MS Ion Maps in Metabolomic Studies.

Boccard, Julien; Grata, Elia; Thiocone, Aly; Muller, Markus; Rudaz, Serge; Wolfender, Jean-Luc. Fall Meeting of the Swiss Chemical Society, Octobre 2004, Zürich (Suisse).

II. Analyse multivariée de données obtenues par gradient rapide LC- TOF-MS dans le cadre d’une étude métabolomique chez A. thaliana.

Boccard, Julien; Grata, Elia; Thiocone, Aly; Carrupt, Pierre-Alain;

Wolfender, Jean-Luc; Rudaz, Serge. Chimiométrie, Décembre 2005, Lille (France).

III. Analyse solvatochromique 3D des forces d’interactions moléculaires dirigeant la liaison de dérivés de flavonoïdes à la glycoprotéine P. Boccard, Julien; Bajot, Fania ; Nicolle, Edwige;

Boumendjel, Ahcène; Gauvrit, Jean-Yves; Lantéri, Pierre; Carrupt, Pierre-Alain; Rudaz, Serge. Chimiométrie, Novembre 2007, Lyon (France).

IV. Machine learning applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana.

Boccard, Julien. PhD Day, Juin 2008, Archamps (France).

V. Analyses métabolomiques en phytochimie. Wolfender, Jean-Luc;

Boccard, Julien. Journées Scientifiques du CCCTA, Septembre 2008, Les Diablerets (Suisse).

(19)

Présentations par affiche

I. Visualisation et comparaison de données obtenues par LC-APCI-MS dans le cadre d’une étude métabolomique chez A. thaliana. Boccard, Julien; Grata, Elia; Thiocone, Aly; Muller, Markus; Carrupt, Pierre-Alain;

Wolfender, Jean-Luc; Rudaz, Serge. Chimiométrie, Novembre 2004, Paris (France).

II. Multivariate analysis of metabolomic data from Arabidopsis thaliana stressed by wounding with rapid LC/TOF-MS and gradient LC/quadrupole-MS analysis. Boccard, Julien; Grata, Elia; Didon, Céline; Thiocone, Aly; Carrupt, Pierre-Alain; Wolfender, Jean-Luc;

Rudaz, Serge. Fall Meeting of the Swiss Chemical Society, Octobre 2005, Lausanne (Suisse).

III. Multivariate data analysis of rapid LC-TOF/MS experiments from Arabidopsis thaliana stressed by wounding. Boccard, Julien; Grata, Elia; Glauser, Gaétan; Carrupt, Pierre-Alain; Gauvrit, Jean-Yves; Lantéri, Pierre; Wolfender, Jean-Luc; Rudaz, Serge. Fall Meeting of the Swiss Chemical Society, Octobre 2006, Zurich (Suisse).

IV. Multivariate data analysis of rapid LC-TOF/MS experiments from Arabidopsis thaliana stressed by wounding. Boccard, Julien; Grata, Elia; Glauser, Gaétan; Carrupt, Pierre-Alain; Gauvrit, Jean-Yves; Lantéri, Pierre; Wolfender, Jean-Luc; Rudaz, Serge. Rencontre des Sciences Pharmaceutiques de l'Arc Alpin, Novembre 2006, Archamps (France).

V. Analyse multivariée par OSC-ACP/CAH de données métabolomiques obtenues par HPLC-TOF/MS chez A. thaliana. Boccard, Julien; Grata, Elia; Glauser, Gaétan; Carrupt, Pierre-Alain; Gauvrit, Jean-Yves; Lantéri, Pierre; Wolfender, Jean-Luc; Rudaz, Serge. Chimiométrie, Novembre 2006, Paris (France).

VI. Multiway PLS calibration based on 3D solvatochromic descriptors with artificial membrane PAMPA-skin experimental data. Boccard, Julien; Ottaviani, Giorgio; Martel, Sophie; Rudaz, Serge; Carrupt, Pierre- Alain. GGMM, Mai 2007, Autrans (France).

VII. Multiway 3D Solvatochromic Analysis of Molecular Interaction Forces Between Flavonoid Derivatives and P glycoprotein. Boccard, Julien; Bajot, Fania; Nicolle, Edwige; Boumendjel, Ahcène; Rudaz, Serge; Carrupt, Pierre-Alain. Fall Meeting of the Swiss Chemical Society, Septembre 2007, Lausanne (Suisse).

(20)

Knowledge Discovery

In Pharmaceutical Sciences,

From Metabolomics

To Molecular Modelling

(21)
(22)

Synopsis

Extracting knowledge from highly multivariate data

Modern scientific technologies are able to generate massive datasets to describe specific phenotypes illustrating a biological phenomenon. Scientific research has switched from classical hypothesis-driven approaches to more pragmatic data-driven methodologies. While the former first establish a theory and collect experimental data to test it only afterward, the latter start with an extensive data collection and aim to build relevant hypotheses from it. Extracting the relevant information among this overwhelming amount of data has become a decisive issue for knowledge discovery. Chemometrics and data mining are close research fields aiming to address these considerations. Despite similar goals, their distinction arises principally from the field they originate, i.e.

analytical chemistry or computer science. While Massart defined chemometrics as “a chemical discipline that use mathematics, statistics and formal logic (a) to design or select optimal procedures; (b) to provide maximum relevant chemical information by analyzing chemical data; and (c) to obtain knowledge about chemical systems" 1, Hand described data mining as “the analysis of (often large) observational datasets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner” 2. These two definitions bear obvious similarities as these closely related disciplines both aim to process and understand complex data structures. Data mining was however originally more dedicated to the analysis of databases, while chemometrics includes experimental design.

Two approaches are achievable when considering the ensemble of variables that contributes to an observable fact, i.e. either independently using univariate methods or simultaneously using multivariate and megavariate techniques. Considering each variable independently is a straightforward approach to evaluate their predictive merit, but the notion of multivariate profiles involving many related features relies on a biological reality. Casual training in data analysis and statistics remains however limited to univariate methodologies such as hypothesis testing, e.g. for means comparison. Statistical tools based on univariate methods constitute a starting point but may be inadequate to establish a relevant relationship between a complex ensemble of variables and a specific phenotype. Moreover, they can only be relevant if there is no

(23)

contributions from other sources to the observed response, unless the results will be biased 3.

Since biological phenomena are intrinsically multivariate, it appears appropriate to consider multiple measurements concurrently. In complex biological data, the relevant information is usually a combination of signals rather than a single variable and more sophisticated statistical methods may be required. By considering all variables together, more information can be obtained than comparing each variable independently. Classical multivariate techniques have been developed to allow the evaluation of more than one variable at the same time.

Data are classically stored into tables where an observation is usually an independent entity represented by a row and described by a set of variables whose values measure different aspects of the observations, corresponding to the columns. Unfortunately, many modern datasets infringe the basic prerequisites of classical multivariate analysis such as multiple regression.

Indeed, these methods are adapted to handle “long and lean” data tables, where few K variables are measured for numerous samples (K<N, see figure 1). A full rank data matrix involving independent and uncorrelated variables constitutes another important requirement. As mentioned above, highly multivariate data clearly don’t fit such definitions as thousands of highly inter-correlated variables are measured for only a small number of observations. The shape of these tables has indeed become highly unbalanced as the number of variables being measured by the instrument is now greatly higher than the number of observations. Traditional data analysis methodologies are not convenient and there is a need for computational tools helping scientists to extract the useful information content of such so-called “short and fat” data tables (K>>N).

Variables K

N Indiv iduals

Classical MultivariateData

Variables K

N Indiv iduals

Highly Multivariate Data

K >> N K < N

Figure 1: Schematic representations of multivariate datasets

(24)

Fortunately, multivariate methodologies provide potent alternatives to classical statistical methods. Such approaches are indeed much more adapted to model real data since their conception and they have been widely adopted in many research fields, including chemical engineering, quantitative structure- activity relationship in medicinal chemistry or environmental studies. The main principle of multivariate analysis relies on the computation of models, summarising the information content of a large number of potentially correlated and noisy variables into comprehensible concepts. Such models are expected to provide both robustness and easy interpretability of data of massive dimensionality by reducing the data complexity. Multivariate analysis is therefore an attractive approach to build relevant models of biological data in many applications and these aspects contribute to the increasing popularity of these methods.

In this manuscript, two examples providing highly complementary characteristics are proposed to illustrate the potential of data mining and chemometrics to handle these massive datasets, extract relevant patterns and offer insights into specific biological phenomena. The first part involves the exploration of a dataset from the metabolomics domain, a research field which gained high interest in recent years. This dataset is characterised by the experimental evaluation of a great number of independent variables (X block) depicting a class dependent variable Y of known structure. This metabolomic dataset is therefore investigated in the context of biomarkers discovery, classification and pattern recognition. The second part is dedicated to the evaluation of a quantitative structure-activity relationship depicting the ability of flavonoid derivatives to bind and potentially inhibit the action of ABC efflux transporters. This second dataset is composed of a X block with independent variables of known structure computed in silico and a target dependent Y variable experimentally evaluated. This second part assesses the quantitative prediction ability of the descriptors in a regression context.

Bibliographic references

1. Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; De Jong, S.; Lewi, P. J.;

Smeyers-Verbeke, J.; Editors. Handbook of Chemometrics and Qualimetrics, Part A; 1997; p 884.

2. Hand, D. J.; Mannila, H.; Smyth, P. Principles of Data Mining; MIT Press: Cambridge, MA, 2001.

3. Bro, R. Multivariate calibration - What is in chemometrics for the analytical chemist? Anal.

Chim. Acta 2003, 500, 185-194.

(25)
(26)

PART A

UPLC-TOF/MS

metabolic fingerprinting

in Arabidopsis thaliana

(27)
(28)

List of abbreviations – Part A

ANN Artificial Neural Network ANOVA Analysis Of Variance

APCI Atmospheric Pressure Chemical Ionisation API Atmospheric Pressure Ionisation APPI Atmospheric Pressure Photoionisation

CE Capillary Electrophoresis

CFS Correlation-based Feature Selection

CI Chemical Ionisation

CODA Component Detection Algorithm COSY Correlation Spectroscopy COW Correlation Optimized Warping DAG Directed Acyclic Graphs

DOSY Diffusion Ordered Spectroscopy DIMS Direct Infusion Mass Spectrometry DOE Design Of Experiments

DTW Dynamic Time Warping EI Electron Impact Ionisation

ESI Electrospray Ionisation

FT-IR Fourier Transform-Infrared Spectroscopy

GC Gas Chromatography

HCA Hierarchical Cluster Analysis

HILIC Hydrophilic Interaction Chromatography HPLC High-Performance Liquid Chromatography

IG Information Gain

KNN K-Nearest Neighbour

LC Liquid Chromatography

LV Latent Variable

MLP Multilayer Perceptron

MS Mass Spectrometry

mzXML Mass Spectrometry Extensible Markup Language NetCDF Network Common Data Form

NB Naïve Bayes

NMR Nuclear Magnetic Resonance OPLS Orthogonal Partial Least Squares OSC Orthogonal Signal Correction

PC Principal Component

(29)

PCA Principal Components Analysis PCR Principal Components Regression PLS Partial Least Squares

PLS-DA Partial Least Squares Discriminant Analysis Q-TOF Quadrupole-Time-Of-Flight

RBF Radial Basis Function

SVD Singular Value Decomposition SVM Support Vector Machines TIC Total Ion Current

TOF Time Of Flight

TMS Total Mass Spectra

UPLC Ultra-Performance Liquid Chromatography

UV Unit Variance

(30)

1 INTRODUCTION 1.1 Metabolomics

1.1.1 Systems biology

Molecular biology demonstrated the importance of genes and their products in the functioning of cells. Today, the study of transcripts and proteins progresses rapidly thanks to well-established methodologies. On the other hand, a wide variety of other organic compounds that are not directly encoded in the DNA are recognised as critical elements for cell function, i.e. the metabolites.

Metabolomics (also known as metabonomics) is a recent discipline intending the global study of metabolites and their concentrations, interactions and dynamics within complex samples 1. It constitutes one of the tools of the post-genomic era

2, the latter being concerned with the study of the different functional levels of a biological system, i.e. the transcriptome, the proteome and the metabolome 3. Systems biology 4 relies on a holistic approach integrating all this information and attempting to provide insight into the interactions between these levels to model biological states (see figure A1) 5-7

Genome

Transcriptome

Proteome

Figure A1: Systems biology and interactions between the genome, the transcriptome, the proteome and the metabolome.

Metabolome Environment

(31)

As metabolites can be considered as the downstream products of cellular regulatory processes 8, metabolomics data can precisely characterise cells, tissues, biofluids or whole organisms by defining a specific biochemical phenotype that is representative of a physiological or developmental state. The metabolome is the holistic quantitative set of low molecular weight compounds (<1000 Da), including many hundreds or thousands of molecules such as carbohydrates, vitamins, lipids and amino or fatty acids. These metabolites participate in metabolic reactions necessary for normal functions, maintenance or growth of a cell 2. Their origin can be either endogenous such as products of biosynthesis and catabolism or exogenous like nutrition or pharmaceutical compounds degradation 9, 10. The chemical properties of the organic or inorganic constituents of this ensemble are greatly variable and this diversity is a critical aspect to consider 11. The variability in molecular weight, polarity or solubility and the wide dynamic range of concentrations (from pmol to mmol) constitute additional difficulties when analysing metabolites while no amplification process is available. Finally, the absence of sequence of repeated building blocks in metabolites’ structures renders their identification more complicated than transcripts and proteins.

Estimation of metabolomes’ sizes is highly dependent on the organism under study and Verpoorte et al. suggested a comparable number of metabolites and genes 12. It has to be noted that there is obviously no direct link between every gene involved in metabolism and a given metabolite. Other estimates propose metabolomes’ size varying from 600 elements for Saccharomyces cerevisiae up to 15’000 distinct compounds within a given plant species, while the human metabolome is expected to be even larger 13, 14. If careful evaluations of the tissue extraction method, sample preparation, data acquisition, and data processing, analysis and interpretation are performed, assessing metabolic profiles can uncover complex phenotypes and specific biomarkers for physiological responses 15. More generally, it could improve the biological knowledge of the metabolome as a part of functional genomics.

Precious information for the genes function determination of biological systems can be obtained concurrently with gene expression (transcriptomics) and translation (proteomics), while functional analysis have gained high interest recently 16-18. Furthermore, metabolic data have been demonstrated more discriminatory to biological changes than transcriptomics or proteomics 19.

Metabonomics was defined as the “quantitative measurement of the time-related multi-parametric metabolic response of living systems to pathophysiological stimuli or genetic modification” by Nicholson et al.1. Besides,

(32)

Fiehn et al. established distinct approaches within metabolomics in 2002, by defining the terms of metabolite target analysis, metabolite profiling, and metabolic fingerprinting 20. Additionally, related approaches such as metabolic footprinting assessing the exometabolome and the secretome are also associated with metabolomics 3.

Metabolite fingerprinting is a non-targeted screening method intending the high throughput analysis of numerous samples for classification purposes.

The quantification and identification of metabolites are usually not required, while this global approach explores differences between metabolites patterns of distinct physiological states by the use of multivariate data analysis. Sample preparation remains simple and hyphenated analytical methods are generally used (GC-MS, LC-MS, CE-MS) with short separation times or direct analysis (direct MS perfusion or NMR ) 21.

Metabolite profiling implies the identification and quantification of selected metabolites related to a definite metabolic pathway 22. Sample preparation and analytical methods are adapted for the specific examination of particular compounds. As the oldest and most conventional approach, profiling is considered as the seminal approach of metabolomics.

Target analysis involves the qualitative and quantitative detection of precise metabolites. Dedicated sample preparation is developed for the elimination of extraneous signals. Devoted analytical methods focusing on a small part of the sample are used and sensitive detection is performed.

Metabolomics implies the still theoretical comprehensive analysis of all metabolites belonging to a biological system 23. A combination of analytical techniques is used to cover the chemical diversity of the molecules, while selectivity and sensitivity are expected to be high. This non-selective approach should not exclude any metabolite. Once metabolites involved in a biological phenomenon are detected, further targeted analyses can be performed to confirm and specifically study compounds of specific interest. These approaches are summarised in figure A2.

(33)

Specific analysis of metabolites Specific analysis of metabolites

Quantification of specific targets

Quantification of specific targets

Global analysis and classification

Global analysis and classification Exhaustive

identification of all metabolites Exhaustive identification of all

metabolites

Figure A2: The facets of metabolomics defined by Fiehn 20

Similarly to transcriptomics and proteomics, data standardisation protocols have been proposed. The Minimum Information About a Metabolomics Experiment (MIAMET) was proposed by Bino et al. in 2004 16, while the Metabolomics Society developed guidelines of reporting standards, i.e. the Metabolomics Standards Initiative (MSI) 24, 25, to document experimental studies precisely and allow other researchers to evaluate and repeat metabolomic studies.

1.1.2 Applications of metabolomics

Measuring metabolites changes could offer deeper insights into biological mechanisms by describing systems responses to environmental or genetic modifications. Metabolomic analyses constitute therefore a potent instrument for the discovery of biomarkers related to a physiological response and the diagnosis of complex phenotypes. Numerous applications are already developed in a variety of research fields of the postgenomic era, from medical science to agriculture.

While early metabolic studies focused on preselected specific compounds, the first untargeted metabolic profiling applications were developed for medical diagnosis purposes 26, 27 and fungal strain classification 28. Such approaches still represent efficient tools for the early detection of diseases 29-31 and taxonomic purposes comparing genotypes or ecotypes 32.

The screening of drug candidates constitutes another promising application by assessing the effects of metabolic modifications or toxicity 33, 34. Indeed, metabolomics offers essential information to model the dynamic behaviour of important metabolic pathways. Investigating the intracellular metabolism in a comprehensive manner will undoubtedly help to understand

(34)

physiological processes related to therapeutic treatments. Besides, metabolomics received recently increasing interest in the nutrition research field

35. Several applications are foreseen, such as food composition analysis, quality and authenticity assessment or monitoring of the physiological consequences of specific diets.

In addition, metabolomics constitutes an integral part of system biology and extensive studies have been performed in various species such as microorganisms 36, 37, plants 23, 38, 39 and human 40. Metabolome analysis represents a key tool to uncover the roles of genes in functional genomics 20. Genotyping and phenotyping will certainly provide precious information to link gene sequence to function.

As metabolomics is expected to greatly benefit agriculture, plant biochemistry, phytomedicine and natural products approaches, its potential in plant science is developed in more details in the next paragraph.

1.1.3 Arabidopsis thaliana, a model for plant metabolomics

Since its early days, plant science relied heavily upon observable features such as morphological characteristics, growth or fertility rates. More recently, genomics and proteomics greatly modified this perception by their ability to highlight so-called “silent phenotypes” that appear identical to wild types when considering only the morphology. Systems biology constitutes an increasing field of investigation in plant research as the recent profusion of information provided by modern analytical technologies allows an innovative insight into plant biology 5, 41. While a small number of plant genomes have been fully sequenced, including Arabidopsis thaliana (thale cress) and Oryza sativa (rice), metabolomics represents a powerful tool for the post-genomic study of such model plants and the metabolic variations induced by perturbations including environmental changes, physical stress, abiotic stress, nutritional stress or mutations 42. The reliable fingerprinting of plants by high throughput methodologies has become intensively used to characterise these phenomena.

The plant kingdom is known to produce a wide diversity of chemical molecules estimated at 200’000 primary and secondary metabolites. While, the relationship between useful characteristics and their related genes is a central issue in plant science, most of the known beneficial properties of plants can be attributed to metabolites and a wide phenotypic description is also fundamental.

Metabolomics constitutes thus an essential part of plant functional genomics and robust methodologies are needed to minimise the experimental error at

(35)

each step of the experimental procedure. Numerous parameters are critical, starting form the plant cultivation in a controlled greenhouse to maintain a degree of uniformity in plant growth until the extraction and fractionation, such as temperature, light and humidity.

The flowering plant Arabidopsis thaliana (figure A3) is an essential model plant system studied for a long time and a lot of information characterising this organism is currently available. Its small size and short germination time have greatly contributed to its popularity in plant research (about six weeks from germination to mature seed). Its rather small genome allowed A. thaliana to be the first plant genome to be fully sequenced while additional information on gene structure and gene product is now accessible 43. Many biological process have been at least partially described including defence response 44-46, photomorphogenesis, gene regulation, development metabolism

47, 48, transport and DNA repair. Metabolites, gene expression, genome maps or genetic and physical markers data are therefore available.

Figure A3: The model plant Arabidopsis thaliana (22 days)

(36)

1.2 Analytical methods

1.2.1 General considerations

Metabolomics experiments involve interdisciplinary knowledge starting from sample collection and preparation to analytical procedures, raw data processing and data analysis 49. The choice of an appropriate analytical technology constitutes therefore one of the critical issue to produce valid and reproducible data. Due to the high chemical diversity of the metabolome, complementary analytical techniques are required for its complete monitoring 50. Indeed, there is a lack of automated analytical techniques, able to measure exhaustively thousands of unknown metabolites in a quantitative manner.

Moreover, sensitivity constitutes a major concern, since there is no mean of amplification as in the case of transcriptome analysis. Multiple analytical platforms are already applied in metabolomic studies 51, such as direct infusion mass spectrometry (DIMS), hyphenated separation technique-mass spectrometry platforms, i.e. gas chromatography (GC)52-56, liquid chromatography (LC)57-59 or capillary electrophoresis (CE)60-63 and nuclear magnetic resonance (NMR)1, 64, 65 or Fourier transform-infrared spectroscopy (FT-IR)66. The selection of the most appropriate methodology can be considered as a compromise between the chemical selectivity, the sensitivity and the speed of the different techniques. The analyses can also be performed in parallel on various platforms to offer an extended coverage of the metabolic diversity as they provide complementary information 67-69. Figure A4 depicts the relative sensibilities of some of the most popular analytical platforms employed in metabolomics.

(37)

NMR 10-6mol

NMR 10-6mol

LC/UV 10-9mol

LC/UV 10-9mol

GC/MS 10-12mol

GC/MS 10-12mol

LC/MS 10-15mol

LC/MS 10-15mol

LC/LIF 10-19mol

LC/LIF 10-19mol

CE/LIF 10-23mol

CE/LIF 10-23mol

Figure A4: Relative sensibilities of the most applied analytical techniques in metabolomics (adapted from 2)

Besides, metabolomics experiments must be properly designed according to the biological question under investigation. The design of experiments (DOE) is a key process intending to handle the experimental parameters such as the sampling strategy, the number of replicates, the sequence of measurements or the extraction solvent, that have to be suitably defined. Such methodology is however beyond the scope of this manuscript and will not be further discussed.

1.2.2 Sample preparation & extraction

The early steps characterising sample preparation and extraction are of particular importance especially when considering plant metabolomic studies.

After the relevant choice of the tissue to be investigated, all metabolic processes must be stopped instantaneously after harvesting to avoid degradations or modifications of metabolites concentrations. The simplest practice is a quick freezing of the sample in liquid nitrogen.

Furthermore, due to their physicochemical diversity, the metabolites content of the sample depends heavily on the subsequent extraction step protocol. The most regular extraction methods in plant metabolomics involve the homogenisation of the tissue by the use of a mortar or a ball mill in an organic

(38)

solvent or a mixture of solvents. Chloroform, isopropanol, methanol and water are frequently used for lipophilic metabolites, while methanol, ethanol and water are employed for polar compounds. Sample preparation can include derivatisation when performing GC-MS experiments, as discussed below.

1.2.3 Mass spectrometry

Mass spectrometry (MS) is a well-established analytical platform employed in many fields and one of the essential instruments in analytical laboratories. By measuring the mass-to-charge ratio (m/z) of elemental or molecular species, it allows the simultaneous detection of multiple analytes with high sensitivities. MS has been demonstrated as a potent platform for metabolomics thanks to its ability to detect metabolites present at low levels (µM concentrations) 70, 71. Moreover, MS high mass accuracy is able to provide structural information, as an exact molecular mass can be indicative of the molecular formula or fragments of the molecular structure 51. This precious information can lead to the confirmation of known low mass regulators or to the identification of new metabolites. The vast majority of recent metabolomic studies relies on this particular technique 32. The recorded data have to be accurate and reliable over the whole experimental procedure in order to highlight relevant metabolites of a physiological state and MS constitutes an appropriate solution.

Four parts can be roughly distinguished when performing MS experiments, namely sample introduction, ion generation, mass separation and ion detection 72. Sample compounds are introduced into the ion source, where they become ionised, they are then separated in a gas phase according to their mass-to-charge ratio (m/z) in the analyser and finally detected. All these steps are summarised on figure A5 and discussed below.

Inlet Source Analyser

Sample Introduction Gas Phase Ions Ion Sorting Ion Detection

Ion Detector Data System

Mass Spectrum

Data Output Figure A5: Mass spectrometry principles

(39)

Ion generation aims to create charged species necessary to separate compounds according to their m/z ratio. Atmospheric pressure ionisation (API) methods have become widely popular when performing MS-based metabolomics experiments as they offer soft ionisation conditions allowing the investigation of intact molecules. API methods include atmospheric pressure chemical ionisation (APCI) 73, atmospheric pressure photoionisation (APPI) 74 and electrospray ionisation (ESI) 75. ESI is a soft ionisation technique that can give rise to multi- charged ions. Ions are formed in a needle under high voltage, leading to a nebulisation from a Taylor cone and droplet division forms ions that are further separated in the mass analyser 76. These techniques provide complementary analyses of both polar (ESI) and non-polar (APCI) metabolites. As particular metabolites will be preferably detected in a specific ionisation mode, both positive and negative mode can be investigated to gather information 77.

When the compounds are ionised, they are transferred from the ion source to the mass analyser where they are separated according to their m/z ratio. Several mass analysers are available to measure these charged species generated by the ionisation phase and two approaches are classically described, i.e. so-called in space or in time separation. In space instruments include Time-Of-Flight (TOF/MS) 78, 79 and quadrupole analysers, while ion traps constitute the in time devices. Besides, the detection can be done either physically at a detector as an ion current or by detecting orbital frequencies as image currents.

Among in-space analysers, many quadrupoles scan mass ranges sequentially while the TOF mass analyser is able to detect all ions altogether and continuously, enhancing the sensitivity. Recent quadrupole provide however fast scanning abilities with sensitivities equivalent to TOF instruments.

Quadrupole instruments and ion traps are considered as low-resolution devices, while high accuracy MS devices such as TOF/MS provide excellent mass precision. A mass resolving power below 5 ppm can be achieved by TOF instruments when using internal calibration or lock mass 71, 80 but they suffer a relatively small dynamic range with two or three orders of magnitude. However, this drawback can be corrected by specific features of the instrument. A mass accuracy of less than 1 ppm is however not always sufficient to allow a satisfactory unambiguous identification of a single molecular formula 81 and this is particularly the case for structural isomers. The quadrupole-time-of-flight analyser (Q-TOF) is a hybrid platform offering two mean of compound identification, i.e. the precise measurement of the molecular ion and induced

(40)

fragmentation to measure product ions. Handling such data require however powerful softwares.

The Fourier transform ion cyclotron analyser (FT-ICR) provides currently the most accurate mass precision (lower than 1 ppm) and resolution 82 while it was recently introduced in the context of metabolomics 66. Fourier transform algorithms convert the orbital frequency of ions orthogonally to a powerful magnetic field into time domain data. The Orbitrap 83 relies on similar principles but it applies electric fields and centrifugal forces to determine the orbital frequencies without requiring high magnetic fields. Similarly to TOF instruments, it can be coupled with a linear ion trap (LTQ-Orbitrap) to provide fragmental information.

1.2.4 Direct injection mass spectrometry

Direct injection mass spectrometry (DIMS) is the most simple and direct approach to MS technology. It provides high-throughput screening by allowing a relative automation 84 and is mainly used for sample classification as its use is rather limited in terms of quantification and metabolites identification. High mass accuracy is desirable and fragmentation through tandem mass spectrometry is usually employed as it provides more information for the identification of metabolites by the detection of the molecular mass and the fragments. The simultaneous measurements of a large number of compounds may suffer of matrix effects, such as ion suppression, a common phenomenon in MS data, especially when performing DIMS analyses on complex samples 85.

1.2.5 Chromatography-mass spectrometry

The hyphenation of MS with prior separative techniques such as chromatography greatly increases the quality of the raw data generated 51, 86. The sequential introduction of compounds into the mass spectrometer enables higher sensitivity and allows more metabolites to be detected but this gain involves an increase in analysis time. Two main separative methods are currently extensively used, namely gas (GC) and liquid chromatography (LC).

The advantages or disadvantages of their coupling to MS are briefly discussed in the next paragraphs.

(41)

1.2.5.1 Gas chromatography-mass spectrometry

Gas chromatography coupled with MS is well-established and already extensively employed in metabolomic studies 87-89. Typical ionisation techniques are chemical ionisation, minimising fragmentation and electron impact producing fragments. A liquid sample is usually introduced into a heated injector, leading to quick vaporisation in combination with a carrier gas. The compounds are separated on the GC column by various equilibriums between the gas phase and the internal coated liquid stationary phase and finally detected 56.

GC-MS provides reproducible and accurate measurements of volatile compounds and fragmentation pattern of molecules with molecular weights lower than 350 Da 52, 54. Chemical derivatisation of semi-volatile compounds is required to increase volatility and produces ions able to be separated in the GC column 90-92. This additional step can however increase the data complexity, while non-volatile or thermo-labile compounds cannot be measured by GC-MS experiments. GC-MS is historically the method-of-choice of the early developments of metabolomic studies 93, 94 and libraries have been developed to facilitate the identification of compounds 95. Recent developments include comprehensive GCxGC-MS, separating compounds with two columns of orthogonal properties 96.

1.2.5.2 Liquid chromatography-mass spectrometry

Liquid chromatography (LC) is a more recent MS hyphenated analytical technique than GC while their applications are complementary (see figure A6) 97,

98. As it does not require prior derivatisation, LC is able to handle a broader range of molecular weights and it has become increasingly popular for the analysis of biological samples due to its high sensitivity even if its resolution is poorer 99, 100. On the other hand, matrix effects influence the analysis of complex samples more heavily than GC 101. LC separation depends on the equilibrium between a mobile phase and a stationary phase. Pumps are required to force the mobile phase to flow through the system.

Another advantage of LC over GC resides in the large diversity of separation mechanisms corresponding to specific chromatographic phases available, including normal phase (silica), reverse phase (C18,C8,C4,phenyl), HILIC (Hydrophilic Interaction Chromatography) and ion exchange. The phase chemistry influences to a great extent the nature of metabolites that can be investigated and the assessment of whole metabolomes is currently impossible using a single chromatographic system 58, 102, 103.

Références

Documents relatifs

La théorie des cristaux photoniques est basée sur la combinaison des équations de Maxwell pour les champs électromagnétiques dans une structure diélectrique avec le théorème de

Although ex- ploitation of game theory for steganography has just begun, and we are aware of only four independent approaches [25, 49, 75, 89], it seems to be a promising

Currently, the bio-medical domain has accepted the Semantic Web technologies as a means to manage (integrate) knowledge.. Although we are witnessing an increased usage of

Urine Targeted analysis based on the 24 putative conjugated steroids and bile acids obtained from (10) (VY samples). n=9 unambiguously identified biomarkers. Dehydroepiandrosterone

Regarding the industrial application aiming to perform olefin oligomerization in chloroaluminates containing EtAlCl 2 , the influence of the solvent composition on the overall

8 - 10 - ةسارد ينهم شيروب ( 2019 :) يف هذه ةساردلا لواح ثحابلا ءاقلإ ءوضلا ىلع بناوجلا ةفلتخملا تاسايسل ضيفخت فيلاكتلا ازربم ةيمهلأ قيبطت هذه يسلا

Here, by means of a transcriptome analysis, we show that roots ectopically expressing MtCLE13 deregu- late only a limited number of genes, including three down-regulated genes

Ces cartes topographiques permettent d’identifier l’émergence des nouvelles innovations et les tendances mondiales, de positionner les inventions des entreprises dans