• Aucun résultat trouvé

Data Computational Research

N/A
N/A
Protected

Academic year: 2022

Partager "Data Computational Research"

Copied!
24
0
0

Texte intégral

(1)

Amye Kenall

Journal Development Manager, Open Data

Gfii Research data and the scientific publication Institut Pasteur, Paris

12 February 2014

Data Computational Research

(2)

• Founded in 2000, bought by Springer in 2008

• BioMed Central publishes 260 open access journals

• ~25,000 peer reviewed research articles published annually

• Genomics and computational biology are a significant fraction e.g. Genome Biology, BMC Genomics, BMC Bioinformatics

• Other key fields include

Public Health / Global Health / Infectious Disease

Cancer

• All research articles are CC-BY licensed for reuse

• Since mid 2013, all data is covered by a CC0 rights waiver unless otherwise indicated

Open Data at BioMed Central

(3)

Strong encouragement to

authors of all journals to provide underlying datasets and

required on a select number (eg, Genome Biology, Genome

Medicine, GigaScience)

CC0 + CC-BY 4.0 by default To do…

Tabular data as CSV download

DOIs for all additional files

Searchability of additional files

OAI-PMH for additional files as a Virtual Data Repository to aid harvesting

e.g. Data Citation Index

Data reuse

Availability of Data section and Data Citation

Encourage use of ISA-TAB (especially GigaScience / GigaDB and BMC

Research Notes)

(4)
(5)
(6)

Open API to retrieve information from

API

(on GigaDB)

URL XML File

Only journal where data and source code behind article + article can be mined through open API

•Only journal offering a home for complex image data (like fMRIs, eg) right next to article

(7)

Linking and Citation

(8)

Already in place

• JMOL for 3D rendering of MOL/PDB

• Google Earth for geographic data (KML)

• Virtual microscope slides

• Mini-websites (generic)

To do...

• Movies as H263/MP4

• Interactive dataset visualization via JS

• Interactive visualization as part of reproducible data analysis

Data visualization at BioMed Central

(9)

Manipulatable 3D Files

in PDF

(10)

Video Files in PDF

(11)

Deep Zoom Electronic Lab

Notebooks

(12)

Already in place

• JMOL for 3D rendering of MOL/PDB

• Google Earth for geographic data (KML)

• Virtual microscope slides

• Mini-websites (generic)

To do...

• Movies as H263/MP4

• Interactive dataset visualization via JS

• Interactive visualization as part of reproducible data analysis

Data visualization at BioMed Central

(13)

Reproducibility of computational research

• Computational research in principle should be easier to

replicate/reproduce than bench studies

• However, practical issues get in the way

• Even if source code is shared, reproducing entire technical setup, gathering appropriate

input data, rerunning analysis , is a significant effort

• This means readers and even reviewers don’t bother

• We would like to reduce this

‘activation energy’

(14)

Strong interest from potential partners

(15)

Key technologies

(16)

Technologies +

Partners +

Journal Article

(17)

+

(18)

• Publishers have role in enforcement of community standards

• Public/academic databases can provide credible long term archiving guarantees for key data

• Academic grid computing infrastructure can provide access for researchers to large-scale computing resource

• Commercial cloud providers

universalize/democratize access to large-scale computing. Even if you are not at an institution with its own facilities, you can carry out high-end computations. No bureaucracy/politics – simply pay per CPU-hour.

Complementary roles of publishers,

academia, and cloud providers

(19)

Flexible management/deployment of packaged

data/analysis suites using VM infrastructure

(20)

• To what extent can/should datasets be included in the VM/suite or pulled in externally? Where should they be hosted?

• To what extent are cross-domain standards for referring to and pulling in underlying datasets feasible. Dataset DOIs typically point to metadata.

• Multiple versions of datasets. To what extent is it practical,

when dealing with evolving datasets/databases, to make them available as reproducible snapshots?

• Culture of data sharing. How to get authors to share their data?

Specific challenges with respect to data

(21)

Culture of Data Sharing

• Data may mean the difference between getting a grant or not.

• Creators (understandably) prefer to hold the data until they have extracted all the possible publication value they can.

• Credit for data and source code is not institutionalised as it is for the article

• This behaviour comes at a cost for the wider scientific community.

(22)

an open data badge.

(23)

With big data and computational tools, research is becoming more

“reproducible/reusable”

The infrastructure is out there and growing

What authors need to communicate their research is also changing, and as publishers we must respond

Clear publishers have a role, with other organisations, in setting some community standards

It took a few 100 years, but publishing is now getting exciting

Conclusions

(24)

Questions?

“One reason that the worldwide web worked was because

people reused each other’s content in ways never imagined or achieved by those who created it. The same will be true of open data.”

– Tim Berners-Lee and Nigel Shadbolt, The Times, New Year’s Eve 2011

Amye Kenall

Journal Development Manager (Open Science), BioMed Central

@AmyeKenall (also @OpenDataBMC) [email protected]

Références

Documents relatifs

The result of this effort was the Open Data Commons Public Domain Dedication and License [11], itself a fusion of ideas from the Talis Community License, an initial

The analysis results are shown in Fig.1. On average, 62% of used data are in a read-only state but they represent only 18% of the accesses made by the application. The propor- tions

Pdfs (histogram style) of absolute momentum fluxes obtained with balloon observations (dark gray), and offline tests with the BCGWD (black) and CGWD (light gray, see text) configuration

Il examine plus particulièrement la construction d'un tel discours dans les pratiques quotidiennes des experts scientifiques de l'observation des mineurs délinquants dans les

We discuss eight challenges: making models simpler, protecting privacy and confidentiality, dealing with legacy systems, stream preprocessing, timing and availability of

*Politecnico di Milano, Department of Management, Economics and Industrial Engineering (DIG), Via Raffaele Lambruschini 4, Milan (Italy), [email protected].. **Politecnico

READABILITY DATA DERIVED FROM SEVENTH-GP~DE SOCIAL STUDIES TEXTBOOK.. of Sentences

Kerstin Schneider received her diploma degree in computer science from the Technical University of Kaiserslautern 1994 and her doctorate (Dr. nat) in computer science from