• Aucun résultat trouvé

THE DATA CITATION INDEX

N/A
N/A
Protected

Academic year: 2022

Partager "THE DATA CITATION INDEX"

Copied!
23
0
0

Texte intégral

(1)

THE DATA CITATION INDEX

AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA

NIGEL ROBINSON

12 FEBRUARY 2014

(2)

©2010 Thomson Reuters

OVERVIEW

• What is the Data Citation Index?

• Building the Data Citation Index

• Citation integration

(3)

2010 Thomson Reuters

DEPOSITION OF DATA BY RESEARCHERS

24%

36%

47%

51%

17%

Publisher website Repository managed by a third party (e.g, domain-…

Department or institutional repository

Personal website

Other

Q16. Where do you place your non-traditional scholarly output to

make it available to others? (n=471)

(4)

©2010 Thomson Reuters

RESEARCHERS NOT RECEIVING CREDIT

4

Barriers to creating and sharing data:

• Researchers are hesitant to spend time and effort to create and share data because they don’t feel the work is adequately exposed or accredited

•Researchers find are finding it difficult to expose data they have produced because data repositories do not have clear standards or

mechanisms in place for doing so

(5)

010 Thomson Reuters

RESEARCHER PROBLEMS

• Access & discovery

• Citation standards

• Lack of willingness to deposit and cite

• Lack of recognition / credit

(6)

©2010 Thomson Reuters

IMPACT ON RESEARCH LIBRARIES

6

(7)

010 Thomson Reuters

DATA CITATION INDEX AIMS

Launched October 2012 3.5M data records

• Enable the discovery of data

repositories, data studies and data sets in the context of traditional literature

• Link data to research publications

• Help researchers find data sets and studies and track the full impact of their research output

• Provide expanded measurement of researcher and institutional research output and assessment

• Facilitate more accurate and

comprehensive bibliometric analyses

(8)

©2010 Thomson Reuters

As we evaluate repositories for inclusion, some of the things we consider are:

• Editorial Content - ensuring that material is desirable to the

research community.

• Persistence and stability of the repository, with a steady flow of new information.

• Thoroughness and detail of descriptive information.

• Links from data to research literature.

REPOSITORY SELECTION & EVALUATION

(9)

010 Thomson Reuters

REPOSITORY EVALUATION

• Repository must hold data

• Repository must provide access to data

Data deposit

• Material added/updated

• Provide statistics on deposited data

• Actively curate data in the archive

Active

• Persistent IDs, DOIs or other permanent ID

• Contacts available for confirmation of interpretation

• Indication of intention to preserve data or provide access over the long term

• Contingency if repository was to cease to operate

• Make data accessible (or state licensing terms)

• Sustainable

• Funding information available for repository and deposited data

Persistent

• Links to literature

• Citation in literature databases

Proof of reuse

(10)

©2010 Thomson Reuters

DATA REPOSITORIES

• Over 1000 repositories identified

(11)

010 Thomson Reuters

TYPES OF DATA BY DISCIPLINE

ART & HUMANITIES CULTURAL

HERITAGE

LANGUAGE CORPUS IMAGE

COLLECTIONS RECORDINGS

SOCIAL SCIENCES

POLL DATA ECONOMIC STATISTICS

LONGITUDINAL DATA NATIONAL CENSUS PUBLIC OPINION SURVEYS

SCIENCE &

TECHNOLOGY MAPS

ALGORITHMS

GENOMICS

SKY SURVEYS

ASTROPHYSICS

REMOTE SENSING

MUSEUM SPECIMENS

(12)

©2010 Thomson Reuters

INDEXING A DATA REPOSITORY

ON WEB OF KNOWLEDGE

• Repository/Source: Comprises data studies, data sets and/or microcitations.

Stores and provides access to the raw data.

• Data Study: Descriptions of studies or experiments with associated data which have been used in the data study.

Includes serial or longitudinal studies over time.

• Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.

• Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a

predicate and an object.

12

Record Types

Descriptive metadata feed from repository

Repository raw metadata is

analysed

Metadata added

Repository

Data study

Data set

Micro- citation

(13)

010 Thomson Reuters

Search Results within the Data Citation Index present the powerful Web

of Knowledge options for exploring a body of

information.

(14)

Link to all Data Citation Index content associated with this

particular Repository.

(15)

Link out directly to the original item, in this case

a Data Study.

(16)
(17)
(18)

©2010 Thomson Reuters

DISCOVERY & CITATION

• Seed data

– Almost 4m data records from 150 repositories

– Around 2m citations

(19)

010 Thomson Reuters

DATA CITATION

Current citation style

(in full text of article as informal citations)

Desired/future citation style (as formally cited references)

U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, 1988.

Version 1. Inter-university Consortium for Political and Social Research.

http://dx.doi.org/10.3886/ICPSR09907.v1

Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a- synuclein. Gene Expression Omnibus.

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G SE11574

(20)

©2010 Thomson Reuters

DATA CITATION

Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a- synuclein. Gene Expression Omnibus.

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G SE11574

Data Citation

Index

New data metrics

Scientific literature Published data sets

(21)

010 Thomson Reuters

REQUIREMENTS & CHALLENGES

Metadata availability

Lack of repository resources Lack of repository expertise

Citable data source

Metadata quality

Metadata inconsistencies Consistent file format Consistent content

Unique and persistent identifiers

Desirable characteristics

Metadata curation and quality control

Required metadata fields present for citation Consistent metadata to allow mapping to DCI fields Data dictionary and schema available

Data repositories are not static How is version control handled?

Formal Citation

Adequate metadata

DOI or persistent and unique identifiers

Partnerships

(22)

©2010 Thomson Reuters

DATA CITATION INDEX

• Discovery of data most important to scholarly research

• Data linked to published research literature

• Measures of data citation, use and reuse with attribution assisted by identifiers

• New metrics for digital scholarship

(23)

010 Thomson Reuters

Thank you

Nigel Robinson

nigel.robinson@thomsonreuters.com

Références

Documents relatifs

The purpose of TrinityWiz is to offer different companies, new data, best practices and opportunities for companies' investments in social networks like Facebook.. 2

Also third party web sites such as rating- or book- ing platforms are using schema.org more often-sometimes even excessively- to increase search engine visibility as well as to

In this seminar we try to answer the question whether data mining is cause or consequence of these re- cent developments through an integrated view of four key components of data

In most of the wells, the results are quite close from each other, except for the two-component model which seems to give systematically higher predictions, and a few

The third case is different, as it is very clear in this case that the independent sampling method is not efficient : Figure 5c shows that most of the sampling produced by

In this paper, we did an analysis of the data set which is provided to participants of the CMC tasks of CLEF 2016. We built some sub-collections for focusing on

All scenarios were recorded using an audio-visual (AV) robot head, shown in Figure 1, equipped with two cameras and four microphones, which provide a stereoscopic video stream and

It is clear on the figure that when using 500 interactions (remember that an interaction is not a complete tutoring session but simply a decision followed by the learner reaction so