• Aucun résultat trouvé

PeakForest: a spectral database and its toolbox, dedicated to the Metabolomics’ community

N/A
N/A
Protected

Academic year: 2021

Partager "PeakForest: a spectral database and its toolbox, dedicated to the Metabolomics’ community"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02744357

https://hal.inrae.fr/hal-02744357 Submitted on 3 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

PeakForest: a spectral database and its toolbox, dedicated to the Metabolomics’ community

Nils Paulhe, Christophe Duperier, Marie Lefebvre, Pierrick Roger-Mele, Christophe Junot, Etienne Thévenot, Franck Giacomoni, Estelle Pujos-Guillot

To cite this version:

Nils Paulhe, Christophe Duperier, Marie Lefebvre, Pierrick Roger-Mele, Christophe Junot, et al.. PeakForest: a spectral database and its toolbox, dedicated to the Metabolomics’ community. 12. annual conference of the Metabolomics Society, Jun 2016, Dublin, United Kingdom. 2016. �hal-02744357�

(2)

PEAKFOREST: A SPECTRAL DATABASE AND ITS TOOLBOX,

DEDICATED TO THE METABOLOMICS’ COMMUNITY

NILS PAULHE

1

, CHRISTOPHE DUPERIER

1

, MARIE LEFEBVRE

4

, DANIEL JACOB

4

, PIERRICK ROGER-MELE

2

,

CHRISTOPHE JUNOT

3☨

, ÉTIENNE THÉVENOT

2☨

, FRANCK GIACOMONI

1

AND ESTELLE PUJOS-GUILLOT

1☨

1: PFEM, UMR1019, INRA, CENTRE CLERMONT-FERRAND-THEIX, 63122 SAINT GENÈS CHAMPANELLE, FRANCE

2: CEA, LIST, LABORATORY FOR DATA ANALYSIS AND SYSTEMS' INTELLIGENCE (LADIS), METABOHUB PARIS, F-91191 GIF-SUR-YVETTE, FRANCE

3: LABORATOIRE D’ETUDE DU MÉTABOLISME DES MÉDICAMENTS, DSV/IBITEC-S/SPI, METABOHUB PARIS, CEA SACLAY, 91191 GIF-SUR-YVETTE, FRANCE 4: BORDEAUX METABOLOME FACILITY, BORDEAUX FUNCTIONAL GENOMICS CENTER, METABOHUB PMB, INRA, 33140 VILLENAVE D'ORNON, FRANCE. ☨: METABOHUB WOKPACKAGE LEADER

INRA - Unite de Nutrition Humaine

Plate-Forme “Exploration du Métabolisme” Centre de Theix

63122 Saint genes Champanelle France

http://www.clermont.inra.fr/plateforme_exploration_metabolisme

Introduction

The French National Infrastructure in Metabolomics & Fluxomics (MetaboHUB) provides a wide range of services for the untargeted metabolomic analysis of

biofluids, tissue samples, cell extracts and environmental matrices, through its

network of platforms. To increase the high-throughput annotation and de novo

identification of metabolites, the consortium is producing high quality reference MS, MS/MS and NMR annotated spectra.

The PeakForest spectral library is being developed to allow centralizing and

sharing spectral data from reference compounds and metabolic profiles, as

well as annotation tools. The originality of the PeakForest project is based on the database structure and on spectral data quality. A system managing the data life

cycle (import, enrich then re-use data in an annotation workflow) built on the

multidisciplinary expertise of biologists, chemists, analysts and software designers, is proposed.

PeakForest

ANR-INBS-0010 Data and services: PeakForest v1.1 already provides basic functionalities like a

database of metabolites and their reference and profile spectra but also querying tools

and data visualization plugged on it.

It provides templates and user-friendly interfaces in order to collect metabolites and

spectral data (peak lists and metadata) from proton 1D-NMR and LC-MS instruments.

Spectra annotations are collected via templates of reference spectra. In mass spectra we gather all peaks (parent ion, isotopes, adducts and in-source fragments); In NMR spectra, each peak is labeled to a molecule atom number (figure 1). For experimental data we added the possibility to refer parts of a fingerprint matching an unknown compound to a “putative compound”.

Originalities: Every data-provider can improve the database quality by adding “curation messages” to compounds or spectra sheets. Then a curator user is informed of the potential error and has extra rights to correct or enrich existing data.

Reference datum Profile Spectra biological knowledge ref. spectra ref. compounds annotations similarity / comparison

figure 1 - PeakForest use cases, showcase with NMR datum

Contact us! interested?

what about YOUR own PeakForest?

PeakForest.org is MetaboHUB’s showcase to provide its reference datum to the whole metabolomics community. But PeakForest is also a project with a standardized format available through its Java environment where you can create your own

database! Thanks to virtualization technologies we are able to

quickly deploy and host new instances of this tool.

Furthermore the PeakForest development team is available to

maintain and update new releases of PeakForest on your own

instance! If you are interested, please contact us :-) ! <contact@peakforest.org>

*.peakforest.org

PeakForest.org

foodball.peakforest.org

inra.peakforest.org

Why? it’s relevant to create private sub instances of PeakForest specialized in a scientific domain (e.g.: the “foodball” project focused on food metabolome) or to store and process

unpublished results.

Data agreement in return: the main instance of PeakForest may be enriched thanks to these sub-datasets after result publication or in order to cover wide metabolomics fields.

PeakForest version 2.0: new functionalities including, 2D-NMR and LC-MS/MS spectra management are scheduled. The PeakForest toolbox is ready to host services compliant with these data. At the end of 2016, this version of PeakForest will provide storage and services for annotation of biological reference matrices. Long-term functionalities like GC-MS spectra management and MetExploreViz integration (cf poster 47 “MetExploreViz: Visualisation of metabolic online” by Chazalviel et al.) are planed for a 2.5 or 3.0 release in 2017.

Perspectives

v 1.1

beta

v 1.0

alpha

v 1.5

beta

v 2.0

stable

v 2+

stable

• store LCMS and NMR proton 1D spectra

• MS spectra viewer

• NMR peakmatching search tool

• NMR spectra viewer

• MS peakmatching search tool

• store NMR 13C 1D spectra

• MassBank export

• galaxy tool

• store MSMS spectra

• store 2D NMR spectra

• spectra from profiles of

biological matrices

• support GCMS


(data/viewer/matching)

• MSMS matching

• MetExplore Viz

• …

late 2015 early 2016 mid 2016 late 2016

a: OLBoyle, Noel M., et al. "Open Babel: An open chemical toolbox." J Cheminf 3 (2011): 33.


b: Giacomoni, Franck, et al. "Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics." Bioinformatics 31.9 (2015): 1493-1495.

c: Merlet, Benjamin, et al. "A Computational Solution to Automatically Map Metabolite Libraries in the Context of Genome Scale Metabolic Networks." Frontiers in molecular biosciences 3 (2016).

references: PeakForest’s WebService PeakForest web application tomcat server mysql database ubuntu server R env. + libs. openbabel(s) Perl libraries

figure 2 - host server configuration

Computational specifications: PeakForest core API is split in different java libraries used in two major projects: the PeakForest Spectral Database, a user-friendly web-application (reference instance at

peakforest.org) and REST webservices, a bot-friendly

API (demo and documentation at doc-ws.peakforest.org.

The host server requires a custom configuration in order to support third part tools plugged on it like “peak-matching” tools or the chemical format converter

library “OpenBabel” (OLBoyle et al. 2011a); cf figure 2.

toolbox

Hosted tools and open toolbox: PeakForest’s online querying and peakmatching tools (lcmsmatching by Pierrick Roger-Mele and NMR peakmatching by Marie Lefebvre) are available to all users. Homemade tools can also be plugged into PeakForest thanks to PeakForest REST API: it provides different querying services in order to extract spectra subsets ready to be used by

any MS / chemical shift matching algorithms (cf figure 3).

The main purpose of this bot-friendly interface is to open stored datum and use them as a resource for computer aid annotation tools in high-throughput metabolomics context.

A proof of concept has been developed through a Galaxy tool, soon available via

workflow4metabolomics.org W4M platform (Giacomoni et al. 2015b), the collaborative portal

dedicated to metabolomics data processing analysis and annotation for the Metabolomics community. We are working on the possibility to call any instance of PeakForest, even private, through this galaxy tool, via a token management system.

Cross service exchanges between databases: webservices are used to link MetaboHUB's tools for mutual enrichment! e.g. we can provide chemical informations on MetExplore’s networks (cf poster 40 “ MetExplore: handling genome scale metabolic networks online” by Jourdan et al.) from PeakForest compounds sheets and use MetExplore’s mapping coverage

statistics in Peakforest (Merlet et al., 2016c).

figure 3 - query PeakForest via a REST client link to peakforest sheet REST client PeakForest’s WebService id spectrum spectrum data compound data peak list (MS: m/z + Δ + ionization mode + RI) (NMR: pH +chemical shift+ Δ + acquisition + RI) JSON GET/REST name / formula / mass / … peakmatching algorithms spectra viewer

Références

Documents relatifs

Search screen permitting to search for certain expression patterns (A.) for microarray data (the example query shown is “Show the genes which are upregulated by a factor

In this paper we present a new Analysis/Synthesis method named SINOLA, which benefits from both sinusoidal additive model and OLA/PSOLA method, and which allows adequate

Figures 4.12 to 4.16 present, respectively, the normalized RMS errors for the non- directional ocean wave spectrum, mean wind direction, spreading parameter, signifi- cant wave

ℓ -group; f -ring; real-closed; condensate; ideal; prime; radical; spec- tral space; spectrum; ℓ -spectrum; Brumfiel spectrum; real spectrum; representable; sober; Stone

The entities TREE, DNA-SAMPLE, PHENOTYPE and MOLECULAR DATA allow multi-annual observations to be stored as individual samples of individual trees, even if the nature of

Light view of the data-model with main entities: both reference and putative compounds are linked to spectra.. A specific entity, metadata, is dedicated to link

This approach bypassing angles and aberration yields a symmetrical relativistic Doppler (SRD) formula with some properties distinct from those of the traditional relativistic

Several methodologies for the dynamic representation of objects and their properties have already been proposed. Among the most well known are the temporal description logic,