• Aucun résultat trouvé

Dissecting PubMed: which content is covered by the Library? and Open Access?

N/A
N/A
Protected

Academic year: 2022

Partager "Dissecting PubMed: which content is covered by the Library? and Open Access?"

Copied!
47
0
0

Texte intégral

(1)

Conference Presentation

Reference

Dissecting PubMed: which content is covered by the Library? and Open Access?

IRIARTE, Pablo, MULLER, Floriane Sophie

Abstract

Our project aims to uncover accessibility to PubMed's contents. By downloading all PubMed metadata, enriching it with missing DOIs and confronting it to our e-journal and paper collection and Open Access tools we dissect the full-text accessibility at our institution: How does the library fare, with its online subscriptions and paper collections of journals? And which portion of PubMed is accessible to the general public via Open Access (OA)?

IRIARTE, Pablo, MULLER, Floriane Sophie. Dissecting PubMed: which content is covered by the Library? and Open Access? In: 16th European Association for Health Information and Libraries (EAHIL) Conference, Cardiff (UK), 9-13 July, 2018

Available at:

http://archive-ouverte.unige.ch/unige:106482

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

LIBRARY

D ISSECTING P UB M ED

Which content is covered by the Library ? And Open Access ?

Pablo Iriarte Floriane Muller

July 12th, 2018

(3)

C ONTEXT

(4)

I NSTITUTIONAL CONTEXT AT A GLANCE

Established in 1559

UNIGE data2017 Library data

Staff

Students

= 4’494,2 FTE

Faculties

nationalities of foreign

students

More than 500 programmes - 27 Bachelor’s degree - 108 master’s degree - 80 doctoral programmes

The Library Open

343 days / year 88 hours / week 16 342 m2

2757 seats 104,4 FTE

(11.05 in Medicine)

Professors (FTE)

(5)

A STORY FIRST

(6)

O UR R ESEARCH Q UESTIONS

• What is full-text coverage of PubMed at our library ?

• Which portion of PubMed is accessible to the general public via Open Access (OA) ?

o Are some PubMed’s articles disconnected from their DOIs ? o Would finding those help increase OA tools’ efficiency ?

(7)

Try some tools (data science &

OA) Observe

Open Access trends within PubMed Discover

how our library

collections cover

PubMed’s contents

Improve contents’

visibility and accessibility at our

institution Discover

how our library

collections cover

PubMed’s contents

P ROJECT A IMS

Try some tools (data science &

OA) Observe

Open Access trends

within PubMed

(8)

LIBRARY

METHODS

Pernell CC BY-SA 2.0

(9)

https://www.nlm.nih.gov/databases/download/pubmed_medline.html

(10)

M AIN D ATASET

> 928 compressed files (gzip)

> Total size: 23 Gb (~200 Gb unzipped)

> 30’000 XML-formatted entries per file (08.02.2018 status)

(11)

D ATA S OURCES

27’836’723 PMIDs

Swiss national Licences

11’348 journals

2631 journals 4’966’742 PMCIDs

93.7 million DOIs

21’840 print holdings (STM field) 128’449 ejournals

2 million records 92 million

records 17.8 million

records

(12)

T OOLS

XMLStarlet

> Command line utilities to parse, extract and transform XML files

> Python library for data

manipulation and analysis

> Python library for algorithm

optimisation (parallel computing)

> iPython Notebooks creation and collaboration tool

(13)

RESULTS

(14)

ACCESS OFFERED BY THE LIBRARY

Université de Genève, Jacques Erard

(15)

METHODS

21’840 print holdings (STM field) 128’449 ejournals

(16)

P UB M ED

(17)

P UB M ED

Which content is covered by the Library ?

(18)

P UB M ED

Which content is covered by the Library ?

(19)

P UB M ED

(percentage)

(20)

P UB M ED

Which content is covered by the Library ? (percentage)

(21)

P UB M ED

Which content is covered by the Library ? (percentage)

(22)

O VERALL P UB M ED COVERAGE OFFERED BY THE LIBRARY

27’836’723 PMIDs

Full-text @ UNIGE = 73,5 %

(23)

O VERALL P UB M ED COVERAGE

OFFERED BY THE LIBRARY

(24)

P UB M EDS C ONTENTS & OA

(25)

V ARIOUS SOURCES OF (O PEN ) A CCESS

(26)

T HE MISSING DOI S S TORY

(27)

P UB M ED DOI S

(28)

M ETHODS : M ATCHING PMID S TO DOI S

• 92 million references in CrossRef,

• 28 million references in PubMed

• Using APD key (Author, start page, date) Merging

PubMed &

Crossref

• PPV = 0.45

• Sensitivity = 0.91

• Specificity = 0.30 Compairing

results with Europe PMC data

• Levenshtein distance to mesure differences between PubMed’s and CrossRef’s article titles

Improving our results

• APD - improved

• Europe PMC PMID-DOIs

• DOIs already in PubMed Merging all

DOIs

(29)

L OOKING FOR THE MISSING DOI S

(30)

L OOKING FOR THE MISSING DOI S

(31)

L OOKING FOR THE MISSING DOI S

(32)

P UB M ED

Existing / found DOIs (percentage)

Total DOIs in PubMed = 11’931’616 - Added DOIs = 7’510’309

(33)
(34)

P UB M ED

Which content is Open Access ?

(35)

P UB M ED

Which content is Open Access ?

(36)

P UB M ED

Which content is Open Access ?

(37)

P UB M ED

Which content is Open Access ?

(38)

P UB M ED

Which content is Open Access ?

Total Open Access in PubMed= 25,2 % - (33% of contents with known DOI)

(39)

P UB M ED

Which content is Open Access ? (percentage)

(40)

T HE STATE OF OA

Figure 2 Number of articles (A) and proportion of articles (B) with OA copies, estimated based on a random sample of 100,000 articles with Crossref DOIs. 10.7717/peerj.4375/fig-2

“We estimate that at least 28% of the scholarly literature is OA (19M in total) and that this proportion is growing, driven particularly by growth in Gold and Hybrid. The most recent year analyzed (2015) also has the highest

percentage of OA (45%).” Piwowar et al. (2018), 10.7717/peerj.4375

(41)

T HE STATE OF P UB M ED OA

“We estimate that at least 28% of the scholarly literature is OA (19M in total) and that this proportion is growing, driven particularly by growth in Gold and Hybrid. The most recent year analyzed (2015) also has the highest

percentage of OA (45%).” Piwowar et al. (2018), 10.7717/peerj.4375

We found that at least 33% of PubMed literature with known/found DOI is OA (6M in total) and that this proportion is growing. The most recent years analyzed are suffering from embargo periods. 2015 has the highest

percentage of OA (49%).

(42)

Accessiblity for swiss citizens

= 29,1 % of PMIDs

Accessiblity for swiss citizens

OA + S WISS N ATIONAL L ICENCES

(43)

C ONCLUSIONS

(44)

Our UNIGE users have

access to 74% of PubMed’s

articles Swiss citizens

have access to 29% of PubMed’s articles

DOIs are (the missing) keys

7’510’309 could be added

Embargoes’

impact on OA is visible

OA evaluation depends on granularity

33% of PMIDs with known DOIs are OA

= 6’131’801

Articles considered:

27’836’723 PMIDs PubMed’s

growth :

>1 mio articles / year

Q UI CK R ECAP

(45)

Benchmarking

Inclusion & promotion of OA tools & contents Systematic reviews

Informed collection development decisions Visibility of our institution’s production (IR)

N EXT ?

(46)

CC-By Николай Максимович

(47)

LIBRARY

T HANK YOU FOR YOUR ATTENTION

SEE YOU NEXT YEAR IN

B

ASEL

!

Pablo.Iriarte@unige.ch Floriane.Muller@unige.ch

Bibliothèque de l’UNIGE, 2018

This document is licensed under Creative Commons Attribution-ShareAlike 4.0 International License: http://creativecommons.org/licenses/by-sa/4.0/.

@pablog_ch

@Flor_Mu

notebooks + slides: www.purl.org/unige/eahil2018

Références

Documents relatifs

Paulin Ribbe, Claudia Engelhardt, Nicolas Larrousse, Claudio Leone, Delphine Montoliu, Yoann Moranville, Pierre Mounier, Ulrike Wuttke.. To cite

Open Access = gold Open Access only (author pays) … Several routes to make research outputs freely available. Myth 2 I’d like to publish

research profs who have an peer- reviewed article accepted to an OA journal.  Up to $3000 a year

To remain consistent with the core tenets of interdisciplinarity, that of crossing boundaries and creating common grounds through integration, interdisciplinary practitioners ought

Like other scientific disciplines, geochemistry now has a number of publishing options available to authors, managed by a variety of institutions, commercial

“Hybrid” open access means that one or more articles in a subscription journal may be open to anyone on the internet even though all the rest of the content is available only to

Selected reports, white papers and Open Access citation advantage studies Collins, Ellen, ‘Summary: A Landscape Study On Open Access Oa And Monographs Policies, Funding And

This academic thought piece provides an overview of the history of, and current trends in, publishing practices in the scientific fields known to the authors (chemical sciences,