• Aucun résultat trouvé

Scientific results traceability: software citation using GitHub and Zenodo

N/A
N/A
Protected

Academic year: 2021

Partager "Scientific results traceability: software citation using GitHub and Zenodo"

Copied!
1
0
0

Texte intégral

(1)

Scientific results traceability: software citation

using GitHub and Zenodo

Oceanography Data analysis Data citation Reproducibility SeaDataCloud ODIP

Python Jupyter GitHub Zenodo orcID

What will we talk about 

 Digital Object Idenfitier: unique alphanumeric string

assigned to identify content and provide a persistent link to its location on the Internet.

 GitHub: web-based hosting service for version control using

git http://github.com/

 ORCID: persistent digital identifier to distinguishes

researchers https://orcid.org/

Zenodo : a repository to deposit scientific papers and/or research

data https://zenodo.org/

How can we foster

reproducibility 

By making    Data Method Results  

 available and citable

Data Results Methods Authors Repositories Catalogs Journals Unique IDs

Earth System Science Data Scientific Data

What? Where Examples

CMEMS INSTAC …

Sextant

SOCIB data products …

  …

Unique identifiers for each

components

 – minted on {

Datasets

Data products (e.g., climatologies)  – identifies authors

Ocean Observation Science

Software tool

Dataset Analysis Software tool Results Publication

 

Is this sufficient to go from data

to results 

Source: ”English Communication for Scientists”, Nature,

https://www.nature.com/scitable/ebooks/english-communication-for-scientists-14053993/writing-scientific-papers-14239285

The software tools should also be properly versioned and cited to document:

1. how the observations are converted to a dataset,

2. how scientific results are derived from the dataset.

Making the code available

If the code of the software tool(s) is open, there are several ways to distribute it: Software code Journals Platforms Hosting services

Geoscientific Model Development Earth Science Informatics

Methods in Oceanography … … …

Research platforms

   

Online infrastructures whose objective is to persistently store and archive digital artifacts relevant to research: articles, data, images, code, model outputs, …

Comparison .

Comparison

Tool CKAN DSpace Figshare Zenodo

Open Source Yes Yes No Yes

Licence Affero GNU GPL v3.0 BSD – GPL-2.0 1st released November 2011 November 2002 January 2011 May 2013

Language Python Java – Python

Deployment Local Local Cloud Cloud

 with  No No Yes Yes

 with  Yes Not direct Yes Login

Let’s have a closer look at

Zenodo

! ingest all research outputs and any file format

! DOIs assigned to have uniquely citable files

! integrated into reporting lines for research via OpenAIRE.

GitHub ORCID

Create account

Figure 1: Zenodo log in: using other accounts is great feature. When linked with GitHub, Zenodo creates a DOI every time one makes a release in one of the enabled repositories.

We now have all the pieces to cite the code used in the research:

1. Zenodo login using  or 

2. Upload of software code to  or to Zenodo

3. Generation of the  for a given version of the code Software tool  Source code Upload  Login  Upload

Figure 2: Getting unique identifiers for software codes using Zenodo.

Example 1

From observations to dataset.

The SOCIB glider toolbox is a set of MATLAB/Octave scripts to manage data collected by a glider fleet: data download, processing and figure generations.

Figure 3: Example of a glider section produced by the toolbox.

The code development was carried out in GitHub

(https://github.com/socib/glider_toolbox, credits to T. Garau

and J.P. Beltran) and was recently coupled to Zenodo.

DOI

DOI 10.5281/zenodo.83670610.5281/zenodo.836706

Example 2: the DIVA

interpolation tool

History

1990’s: Variational Interpolation Method (Fortran 77) only 2D

interpolations

2006 SeaDataNet, code refactory and set of bash scripts

2007  with ODV

2008 code in Subversion , distribution through GHER web page

2009 new modules in Fortran 90 for loops over depth and time

2012 new error calculation technique

2017 new version control system:

1. Switch from SVN to , distribution via , sync with

2. Enable Diva repository in Zenodo

3. Edit the different tags on GitHub to get DOI

ORCID

DOI

Figure 4:  Main page of DIVA software in Zenodo. Note the ORCID logo with the authors

and the DOI relative to the code.  Share on social media and ”cite as” options.

Putting all the pieces together

To ensure reproducibility and traceability, unique identifiers () are attributed to:

▶ Datasets

▶ Software tools ▶ Authors

▶ Scientific results

Ideally, all the identifiers should be present in the published version of the research paper.

Ocean Observation Science

Software tool

Dataset Analysis Software tool Results Publication

 

Source code, notebook

Upload

Login

Upload

Figure 5: From data to final results: all the components are identified and citable.

Acknowledgements

The work presented in this poster has been developed in the frame of

SeaDataCloud–Further developing the pan-European infrastructure

for marine and ocean data management, Project ID 730960 and

ODIP 2–Extending the Ocean Data Interoperability Platform,

Project ID: 654310.

Charles Troupin, Cristian Muñoz, Juan Gabriel Fernández and Miquel Àngel Rújula  GHER_ULiege

socib_icts 

https://github.com/gher-ulg/ https://github.com/socib/

Références

Documents relatifs

To tackle this, we propose the SemanGit ontology, the first ontology dedicated to the git protocol, which also describes GitHub’s features to show how it is extensible to encompass

It is funded by the Russian Presidential Academy of National Economy and Public Ad- ministration (RANEPA, http://www.ranepa.ru/eng/). The project has two main aims: 1) to create

While BEE is responsible for the creation of JSON files containing reference lists of published articles, SPACIN processes each of these JSON files, retrieves metadata about

The distinctive features of the presented method are application of the prob- abilistic topic model to perform restricted snowball sampling and collect citation network, then the

Table 1 shows the mapping between the attribute re- quirements of the Swagger RESTful API specification, and how these correspond with either attributes of the GitHub API

Considering the urgent need for new, automated approaches to browse and aggregate scientific information, in recent years a number of natural language processing challenges have

If this is confirmed, we would like to explore by means of surveys and inter- views with developers actively contributing to open source projects, whether beyond the existance of

The citing table corresponds to all first patent applications made in the major patent offices (DPMA, EPO, INPI, IPO, JPO, USPTO, WIPO 6 ) and the cited table embodies all