Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Open Datasets for Evaluating the Interpretation of Bibliographic Records

Partager "Open Datasets for Evaluating the Interpretation of Bibliographic Records"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "Open Datasets for Evaluating the Interpretation of Bibliographic Records"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: hal-01302830

https://hal.archives-ouvertes.fr/hal-01302830v2

Submitted on 18 Oct 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Open Datasets for Evaluating the Interpretation of Bibliographic Records

Joffrey Decourselle, Fabien Duchateau, Trond Aalberg, Naimdjon Takhirov, Nicolas Lumineau

To cite this version:

Joffrey Decourselle, Fabien Duchateau, Trond Aalberg, Naimdjon Takhirov, Nicolas Lumineau. Open Datasets for Evaluating the Interpretation of Bibliographic Records. Joint Conference on Digital Libraries, Jun 2016, Newark, United States. pp.253-254, �10.1145/2910896.2925457�. �hal-01302830v2�

(2)

4 – Extract of a unit test from T42 1 - Background

FRBRization is a metadata migration process which aims at extracting FRBR entities from MARC records.

• Crucial for the adoption of Semantic Web technologies in libraries

• Many tools proposed to perform the migration during the last decades

• No benchmark to compare and evaluate these tools

We provide two open datasets dedicated to the evaluation of FRBRization tools considering different specificities of MARC catalog like cataloguing practices, inconsistencies and bibliographic patterns.

Open Datasets for Evaluating the Interpretation of Bibliographic Records

Joffrey Decourselle ¹ , Fabien Duchateau ¹ , Trond Aalberg ² , Naimdjon Takhirov ³ and Nicolas Lumineau ¹

1

LIRIS, UMR5205, Université Lyon 1 Lyon, France

[email protected]

2

NTNU

Trondheim, Norway [email protected]

3

Westerdals - Oslo School of Arts, Communication and Technology - Faculty of Technology - Oslo, Norway

[email protected]

2 – Specificities of MARC records

Cataloguing practices and inconsistencies:

3 – Open Datasets

Including both MARC files and FRBR gold standard

Features T42 BIB-RCAT

Number of unit tests 42 -

Number of collections 126 3

Number of languages 3 1

Number of media types 8 4

Average MARC records 10 / test 560

Average fields / records 18 17

Average FRBR entities 73 / test 1922 Average FRBR properties 241 / test 9517

http://bib-r.github.io/

• T42 allows the evaluation of a migration tool in terms of bibliographic patterns and cataloging issues.

• BIB-RCAT offers a larger collection for evaluating the interpretation of MARC records in a real-world context.

• Missing information (missing of publication info or authoritative data leading to misunderstandings).

• Linkage errors (All errors in title or responsibility identifiers leading to dead links between records).

• Cataloguing practices and norms (Specific form of data in the record, e.g., ISBD punctuation)

• Core pattern (basic bibliographic cases)

• Augmentation pattern (any addition of a Work)

• Derivation pattern (Intellectual modification)

• Aggregation pattern (whole-part relationships)

• Complementary pattern (other related works) Bibliographic patterns:

Example of derivation patterns in FRBR

(adaptation and translations)

Références

Télécharger maintenant ( PDF - 2 Page - 1.31 MB )

Documents relatifs

Analysis of Keystroke Dynamics For the Generation of Synthetic Datasets

The EER value computed from DataSU is used to qualify the capacity of synthetic Keystroke dynamics data to be indistinguishable from real Keystroke dynamics data.. Thus, an EER of

Datasets for the Evaluation of Substitution-Tolerant Subgraph Isomorphism

As often in pattern recognition applications, noise may affect the structural representation, that is to say that there exist differences between the pattern graph and each of

The Case for Open Datasets from IoT-connected School Buildings

Building Data Genome is another recent research project [11], [4], that deals with producing datasets from public and non-residential buildings that be can utilized by the

Geolocalized filtering of open data datasets for mobile devices

Starting from some public datasets, an initial data augmentation has been introduced to include location information whereas it was possible and useful.. Then an optimized

Evaluating the Availability of Open Citation Data

In this paper, we analyze to which extent citation data of publications are openly available, using the intersection of the Cross- ref metadata and unpaywall snapshot as

Multimodal Datasets of the Berlin State Library

To facilitate the handling of digital library content and its accompanying metadata, four multimodal and multilingual datasets are presented that are relying on the publicly

On Datasets for Evaluating Architectures for Learning to Reason

In our poster, we will introduce new datasets in propositional logic and first-order logic that can be used for learning to reason, and present some initial results on systems

Provenance-Aware LOD Datasets for Detecting Network Inconsistencies

The Resource Description Framework (RDF) 1 can be used to provide a uniform representation for network data derived from heterogeneous resources [2], however, automatically

Documents relatifs

Evaluation of Gridded Meteorological Datasets for Hydrological Modeling

Evaluation of Gridded Meteorological Datasets for Hydrological Modeling

16

0

0

Supporting the Selection of Open Innovation Software Tools

Supporting the Selection of Open Innovation Software Tools

12

0

0

Benchmarking of Five Typical Meteorological Year Datasets Dedicated to Concentrated-PV Systems

Benchmarking of Five Typical Meteorological Year Datasets Dedicated to Concentrated-PV Systems

9

0

0

Matilda: Building a bibliographic/metric tool for open citations and open science

Matilda: Building a bibliographic/metric tool for open citations and open science

11

0

0

Mécanismes de gestion et stratégies utilisés par les Centres d'écoute ou d'intervention téléphonique auprès des appelants réguliers : recension des écrits cliniques et scientifiques, enquête terrain et recommandations

Mécanismes de gestion et stratégies utilisés par les Centres d'écoute ou d'intervention téléphonique auprès des appelants réguliers : recension des écrits cliniques et scientifiques, enquête terrain et recommandations

129

0

0

A 36 nW, 7 ppm/°C on-Chip Clock Source Platform for Near-Human-Body Temperature Applications

A 36 nW, 7 ppm/°C on-Chip Clock Source Platform for Near-Human-Body Temperature Applications

21

0

0

Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

2

0

0

Unified Model for Assessing Checkpointing Protocols at Extreme-Scale

Unified Model for Assessing Checkpointing Protocols at Extreme-Scale

41

0

0