Mercredi 12 Février 2014 Frank Hoonakker
From the ELN to Data Warehouse
Applied to chemical reactions
2 / 11 2
La chimie de recherche
*• > 50 000 companies
• > 2 000 universities
• 600 000 chemists
– 5,4 Billion € (consumables)
– 300 reactions / chemist / years :
180 000 000 reactions / year
50% > failed reactions > 70%
~ 3% pulished
170 000 000 / year «lost chemistry»
* UE & US
3 / 11 3
Bernard Werber : Nouvelle encyclopédie du savoir relatif et absolu
In scientific journals, only successful experiments are reported. But we should also report those that do not work. Due to lack of information, failed experiments are reproduced indefinitely by other scholars ignorant of their failure ...
Données accessibles aujourd’hui
4 / 11 4
Données accessibles aujourd’hui
Publications 3 %
Companies 27 %
Lost Chemistry
Unexploitable 70 %
Example - CASREACT : 44 millions reactions
150 000 per week
Only successful
Not all field are usable
Protocol non accessible (need publication) Database content : unknown
Indiscriminate search engines
5 / 11 5
Chemist’s WorkFlow
Aldrich ($ 1 798M),
Johnson Matthey (€ 9 105 M)…
Réactifs
Données Publications
Système d’information
Archivage Publication
Recherche
Recherche Achat
6 / 11 6
Changing Worflow 1:
Condensed Graph of Reactions (CGR)
Transformation of a reaction into CGR
Conventional bonds: simples,
doubles, aromatics …
Dynamicals bonds:
Create a simple, break a simple, …
eSniff
Google for chemistry.
7 / 11 7
Changing Worflow 1:
Condensed Graph of Reactions (CGR)
Transformation of a reaction into CGR
Conventional bonds: simples,
doubles, aromatics …
Dynamicals bonds:
Create a simple, break a simple, …
eSniff
Google for chemistry.
8 / 11 8
Changing Worflow 1:
Condensed Graph of Reactions (CGR)
eSniff
Google for chemistry.
Similarity for reaction:
• Easier (draw what you need)
• Faster (one query -> find the best)
• Smarter (new knowledge)
• More relevant (think as a chemist)
Use Failed reactions
• Anticipate side reactions
• Calculate protecting group stability
9 / 11 9
Changing Worflow 2:
Generating Data Warhouse
eShare
Publish (Success
Failed)
ePro eSniff
Private DB
€
Academic
Feed
Exploit
WEB 2.0
eSniff
eMol
Hot line
Secu- rity
Crowdsourcing
KNOWLEDGE
10 / 11 10
Changing Worflow : Are the Chemist Agree ?
Would you be interested in using these data ?
Yes % No%
Private
11 / 11 11
Changing Worflow : Are the Chemist Agree ?
Would you be interested in publishing data ?
Yes % No%
Private
12 / 11 12
Changing Worflow : Today
eShare Starting
Patent
PhD Thesis
Existing data in Paper Notebook (ECLEIR project)
13 / 11 13
Changing Worflow : T oday
Today :
• 800 utilisateurs
• 1 000 000 réactions
In 3 years :
• 6 500 utilisateurs
• 1 800 000 réactions
In 5 years :
• 10 400 utilisateurs ePro
• 2 500 000 réactions
Leader «lost Chemistry»
20 000 000
15 000 000
10 000 000
5 000 000
0
35 000 30 000 25 000 20 000 15 000 10 000 5 000 0
Chimistes Données
14 / 11 14
Changing Worflow : T omorow
Author
Date
Reaction
Cond. Signature CGR
Properties of reactions
Tomorrow
Similarity
Selectivity
Chemists
15 / 11 15
Changing Worflow : T omorow
Author Date
Reaction
Cond. Signature CGR
Properties of reactions
Tomorrow
Similarity
Selectivity
IC50.
Author
ΔS
EC50 [C]
Log K
Properties of compound
Chemists
Other Field : Biology,
Pharmacology…
…
16 / 11 16
Changing Worflow : T omorow
eShare
17 / 11 17
Conclusion
• Data existing, but are not easily accessible
• Both failed and successful experiment are needed
• Scientists agree to publish some of them, but need an easy way
• The more natural way is to use an ELN
• Laboratory notebook is the starting point for all knowledge…
• eNovalys propose an easy way from ELN to data Warehouse
for the chemists and associate scientist.
18 / 11 18