HAL Id: hal-02742064
https://hal.inrae.fr/hal-02742064
Submitted on 3 Jun 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Spectral Database: from data model to web interface
Nils Paulhe, Christophe Duperier, Daniel Jacob, Etienne Thévenot, Jean-Francois Martin, Claire Lopez, Franck Giacomoni
To cite this version:
Nils Paulhe, Christophe Duperier, Daniel Jacob, Etienne Thévenot, Jean-Francois Martin, et al..
Spectral Database: from data model to web interface. 8. Journées Scientifiques du Réseau Français de Métabolomique et Fluxomique RFMF, May 2014, Lyon, France. 2014. �hal-02742064�
METABOHUB’S SPECTRAL DATABASE: FROM DATA MODEL TO WEB INTERFACE
NILS PAULHE 1 , CHRISTOPHE DUPERIER 1 , DANIEL JACOB 2 ,
ÉTIENNE THÉVENOT 3 , JEAN-FRANCOIS MARTIN 4 , CLAIRE LOPEZ 1 , FRANCK GIACOMONI 1
!
1 METABOLISM EXPLORATION PLATFORM, INRA (CLERMONT-FERRAND/THEIX)2 BORDEAUX METABOLOME PLATFORM, INRA (BORDEAUX)
3 THE METABOLOMEIDF , CEA (SACLAY)
4 METABOLOMICS AND FLUXOMICS PLATFORM, INRA (TOULOUSE)
INRA - Unite de Nutrition Humaine
Plate-Forme “Exploration du Métabolisme”
Centre de Theix
63122 Saint genes Champanelle France
http://www.clermont.inra.fr/plateforme_exploration_metabolisme
Conclusion Methods
For the MetaboHUB’s Spectral Database code architecture, we use modelling concepts and objects-objects relations to represent mass / nuclear magnetic resonance (NMR) spectrometry complexity. Manipulated entities and their links are based on data / metadata used by chemists in laboratory routine. Our work is split in three levels with data-model building, users services designing and code engineering.
Still in early development phase, some functional blocks are already being tested.
The first challenge was to validate this data-model and the use cases with chemists. We are currently testing the chemical library module with data provided by MetaboHUB partners.
Code architecture: the MVC pattern is better for code maintainability, team development and security.
Light view of the data-model with main entities: both reference and putative compounds are linked to spectra. A specific entity, metadata, is dedicated to link spectra by their shared properties. The user entity is independent: it is used only for web-portal access control,
not for spectra or compounds data ownership.
NMR spectre NMR peakNMR peak pattern Coupling Constant
UV spectre
peaks
users putative compound
compound name
Ref. compound
spectre metadata
chromato
Introduction
MetaboHUB is an infrastructure of metabolomics and fluxomics that provides tools to research teams and partners; a full workpackage focuses on bioinfomatics tools design and development, including one called “MetaboHUB’s Spectral Database”. This tool is created to gather and share knowledge and data from both Mass and NMR Spectra.
The core of the database called “data-model”, is a computational representation of each entity involved in spectra analysis and chemical compounds annotation.
NMR spectre
NMR peak
NMR peak pattern NMR spectre
NMR peak
NMR peak pattern
correlation spot
Link spectra to compounds: spectra match 4 types of compounds, common properties or relations given by a virtual “parent” object in the data-model. We provide reference compounds via chemical libraries manager.
Putative compounds are annotated with an interactive identification tool and stored in the database. Three different entities of each reference compound allow us to match putative compounds with missing properties (like the 3D structure).
NMR spectra specificity: the data-model must support one and multi- dimensional spectra (schema 4) and specific entities like peaks patterns and coupling constants (schema 3).
peak spectre (v) Abstract Chem. Comp.
Putative Comp.
add via interactive identification
identif.
list of
Ref. Comp. add via chem. lib
struct.
comp.
sub-struct.
comp.
generic comp.
chem. comp.
struct (can smiles) mass
id in ext.
bank 3D struct.
inChi 2D struct.
mol. file use in NMR and lipidomic
parent children
match (score)
name
match (score)
Carboxylic acid
pinene
(1R)-(+)-α-pinene
Code architecture and frameworks are an originality of this project. Developers use java frameworks and common generic toolbox to keep to the time scheduled and interoperability. There are three levels of architectures: the spectral database core (figure 2), implemented in both webservices and the web-portal model-view-controller (MVC) (schema 5).
There are two ways to access the data: though a web-portal or a webservice. Each service on the web-portal gets his own/specific interface, with user control access.
Additional interfaces, like spectral annotation tools, will be implemented later as plugin.
The core of the database is split in stand-alone blocks, which allow the community to re-use them in others projects. Each block is specific to one function.
It was designed with spectrometry experts knowledge; their experience will also be required to build an interactive identification interface for spectral analysis.
This database will be available through a web-portal and open for each partner of the MetaboHUB consortium in a first time, to the whole community later.
Two milestones are coming: a first to provide a mechanism to import spectral data in the data-model and feed the database, a second to define metadata around spectral analysis.
We use a user-friendly framework (twitter bootstrap) to design web interfaces.
Webservices are the way to open with bots data stored in the spectral database, submitting massives requests and hudge computing. We use popular technologies like REST and Json.
user services
computing
To store data in the database, we use the Hibernate framework. We keep the model relations and constrains, with an abstraction of technologies used for SQL requests.
data model
Removing the complexity of the full data-model allows us a better understanding of it; the first schema proposes a light view of relation entities, others are focused on parts of it.
Parts of the data-model are specific for spectra technics (like MS-MS), others are created for futur developments (like biological matrix spectra annotation).
Interfaces and user access: some database queries are available for all users (get data), others require an authentification (add or modify elements, data curation).
“Spectral annotation tools” building is an aim of the MetaboHUB’s Spectral Database. We have to create the structure to be used to host and link all data provided by chemists.
screenshot 1: a basic use of the Spectral Database: show a compound card
request id list
spectra and/or cpd +list
request peak list
add
compounds list / add peak list
success
u se rs in p u ts
fail
sp ec tr al D B c o re u se rs o u tp u ts matching
list
matching view
figure 1: three basic interface available in the web-portal
schema 3: NMR specific entities schema 4: multi-dimensional NMR spectra
schema 1: light view of the data-model
schema 2: compounds matryoshka dolls model
core
spectral db apidata model files parser io-spectra-files io-chem-files
ext. resources
connectors external-tools io-galaxy external-banks
figure 2 : core of the Spectral Database
view
user
controller model
manipulates
uses updates
sees
database load/store
store
security core
user friendly
schema 5: MVC structure