• Aucun résultat trouvé

Supporting findability of COVID-19 research with large-scale text mining of scientific publications

N/A
N/A
Protected

Academic year: 2021

Partager "Supporting findability of COVID-19 research with large-scale text mining of scientific publications"

Copied!
1
0
0

Texte intégral

(1)

Supporting findability of COVID-19 research with

large-scale text mining of scientific publications

Danielle Welter

1

, Carlos Vega

1

, Maria Biryukov

1

, Valentin Grouès

1

, Soumyabrata Ghosh

1

, Reinhard Schneider

1,

Venkata Satagopam

1

1 Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg

The BioKB platform

1

Exploits text mining and semantic technologies to help researchers easily access semantic content of thousands of abstracts and full text articles

• Concepts from a range of contexts, including proteins, species,

chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms

• Co-occurring concepts are classified based on their asserted

relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledgebase • All concepts in the BioKB dictionaries linked to stable, persistent

identifiers

Ø Resource accession such as an Ensembl2, Uniprot3 or

PubChem4 ID for genes, proteins and chemicals

Ø Ontology term ID for diseases, phenotypes and other ontology terms

Background

COVID-19 pandemic – lots of research efforts quickly redirected towards studies on SARS-CoV2 and COVID-19 disease

Ø Sequencing and assembly of viral genomes Ø Elaboration of robust testing methodologies

Ø Development of treatment and vaccination strategies • Flurry of scientific publications around SARS-CoV-2 and COVID-19

Ø Increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field

Extension of the underlying dictionaries to increase

the sensitivity of the text mining pipeline to viral data, including

Ø Additional viral species (via NCBI Taxonomy5

identifiers)

Ø Phenotypes from the Human Phenotype Ontology6

(HPO)

Ø COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology7

Ø Additional diseases (DO8)

Ø Biological processes (GO9)

Références

Documents relatifs

Automated annotation of scientific publications in real-world digital libraries requires dealing with challenges such as large number of concepts and training examples,

As such any discovery, conformance, or extension algo- rithm of ProM can be used within a RapidMiner analysis process thus supporting process mining workflows..

The first tool identifies numeric properties in the patent text and normalises them, the second extracts a list of keywords that are relevant and reveal the invention in the patent

Several large-scale terminological resources are used by the OntoGene system in order to detect names of relevant domain entities in biomedical literature (proteins, genes,

As inter-commercial monochrome short segments are numerous in TV streams and are by definition visually very similar, they all fall into the same bucket af- ter product

Testing following DNA destruction: 1 Sample is taken from the patient using swab ⇒ 2 DNase is applied to the sample, destroying DNA ⇒ 3 A colourimetric method is applied ⇒ 4

The impact of the COVID-19 pandemic on the world of scientific publications: rapid publication and simplified access for articles about the virus, the jump in popularity

Abstract— We present an ongoing effort on utilizing text mining methods and existing biological ontologies to help readers to access the information contained in the