• Aucun résultat trouvé

Semantic Linking and Integration of Researchers and Research Organizations in DISQOVER

N/A
N/A
Protected

Academic year: 2022

Partager "Semantic Linking and Integration of Researchers and Research Organizations in DISQOVER"

Copied!
2
0
0

Texte intégral

(1)

Semantic linking and integration of researchers and research organizations in DISQOVER

Filip Pattyn1, Steven Vanderschaeve1, Stijn Vermaere1, Paolo Van Hu↵el1, Kenny Knecht1, and Hans Constandt

ONTOFORCE, Ottergemsesteenweg-Zuid 808, 9000 Gent, Belgium [email protected],

WWW home page:http://www.ontoforce.com

Abstract. Keywords: linked data, semantic web, ORCiD, GRID, data aggregation, smart searching

1 Background

ONTOFORCE has developed DISQOVER (http://www.disqover.com), a se- mantic search engine with faceted search capabilities for life sciences. It cur- rently allows to search automatically across more than 115+ di↵erent public data sources that are aggregated, interlinked and contain information about 21 di↵erent data types.

This system uses semantic web technologies to embrace the mapping ef- forts from di↵erent projects like Unified Medical Language System (UMLS), SNOMED CT, ICD10, ICD9, MedDRA, Human Disease Ontology (DO), Medi- cal Subject Headings (MeSH) and Human Phenotype Ontology (HPO) amongst others. These projects structure and encode information related to diseases, phe- notypes, and clinical signs.

Many of the sources included in DISQOVER aren?t available in a semantic web format. Therefore, we developed a data source update pipeline that con- stantly checks the update status of the data at its source. It also means that the data needs a conversion step to a semantic web format (e.g. ttl) to be able to be linkable to other data sources.

2 Results

Since we are trying to aggregate the information of identical concepts, we are investigating di↵erent mapping strategies. For example data sources describing information about chemicals contain di↵erent identifiers depending on the orig- inal source of the data. ChEMBL and PubChem could be considered as golden sources and other identifiers like a CAS number or an InChi key could be con- sidered for mapping too.

One of the more challenging items in linking data is to solve the mapping issues of person names in publications, patents, clinical trials and grant appli- cations. Names can be misspelled, initials are sometimes used instead of a full

(2)

2

first name or one person could be annotated with di↵erent spellings. We used ORCiD (http://www.orcid.org) that provides a persistent digital identifier for researchers, as a golden source for persons. A researcher can make a personal profile on ORCiD and can add his or her scientific output to it. This makes it possible to change names into physical persons based on the claimed scientific output linked to an ORCiD profile. Subsequently, we employ mapping techniques to map the remaining names to these ORCiD Unique Resource Identifiers. As a result user profiles are directly linked to clinical trials, publications, patents, grant applications and indirectly to drugs, chemical, proteins, genes, pathways and more.

Moreover, we try to solve the issue of di↵erent layouts and spellings of author affiliations in publications, clinical trials, patents and grant applications. The Global Research Identifier Database (or GRID) (http://grid.ac) is a curated catalogue with a worldwide coverage of research organizations. We digitize the author affiliations and map them to other entries of affiliations in public data sources.

Overall, this work has let to a more in-depth linking of persons and organi- zations with other data types.

Références

Documents relatifs

Kurose, “10 pieces of advice I wish my PhD advisor had given

The dataset contains in total, 173 authors and 76 papers from the LAK (Learning Analytics and Knowledge) conference series in 2011 and 2012, and the special edition on

The article is structured in the following way: section 2 discusses the application of qualitative and quantitative methods, and establishes as a starting point, a possible

Les exemples sont multiples: nucléaire, biotechnol- ogies, nanotechnologies, changement climatique, maladies animales transmissibles à l’homme, sécurité alimentaire,

Huma-Num (deposit, preservation and dissemination of research data via the NAKALA service), along with the RDM training program for PhD students provided by the

 Columbia  University

–  research discussions –  paper presentations –  practice talks. – 

Class  by  David  Jensen  at  University  of  Massachusetts,  Amherst    .!. DOING RESEARCH