• Aucun résultat trouvé

OpenMinTeD: Text mining services for e-infrastructures

N/A
N/A
Protected

Academic year: 2021

Partager "OpenMinTeD: Text mining services for e-infrastructures"

Copied!
16
0
0

Texte intégral

(1)

HAL Id: hal-01543468

https://hal.archives-ouvertes.fr/hal-01543468

Submitted on 4 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

OpenMinTeD: Text mining services for e-infrastructures

Sophie Aubin

To cite this version:

Sophie Aubin. OpenMinTeD: Text mining services for e-infrastructures. Open Harvest, May 2017, Chania, Greece. pp.15. �hal-01543468�

(2)

twitter.com/openminted_eu

Sophie Aubin - INRA

Text mining services for e-infrastructures

Open Harvest, Chania, 3 June 2017

OpenMinTeD

(3)

Amounts of scientific texts...

Open Harvest, Chania, 3 June 2017

120,000 papers

published on a single taxon “Zea mays”

90% of papers never cited 1 paper/sec

50% of papers never read by

anyone than its authors, referees, journal editors

Publications

+ reports, patents, books,

surveys, news, etc.

(4)

TDM services

This is where the footer goes Open Harvest, Chania, 3 June 2017

Indexing documents and datasets

Entities identification and normalisation against

reference data

Information extraction: from

semi-structured to structured

Semantic/lexical resource

acquisition from texts

(5)

However, …

Multitude of solutions catering for different

Text Types Newswire

Scientific Literature Tweets/blogs

Patents

Clinical/medical records Textbooks, monographs Online forums

….

Languages English French German Spanish Portuguese Italian

Polish

….

Tasks

Translation

Knowledge acquisition Semantic Search

Question Answering Sentiment Analysis Summarization

Knowledge Discovery

….

Domains Agriculture Health Biology

Social Sciences Environment

….

Creating a fragmented landscape

Open Harvest, Chania, 3 June 2017

(6)

Then comes

This is where the footer goes

16 Partners:

- research groups in text-mining

- content providers - a data center

- a library association - legal experts

- community related partners - SMEs

Open Harvest, Chania, 3 June 2017

• Started: June 2015 , Duration: 3 years

(7)

What is ?

is a platform that works as an infrastructural

service of the wider research ecosystem

Open Harvest, Chania, 3 June 2017

(8)

Our Services

Open Harvest, Chania, 3 June 2017

Discover TDM Services and tools

Feed with home content or easily get texts from (OA) content hubs

Pick adequate knowledge resources

Share and Re-Use

Build your own service/applications –

Combine components into a Workflow

(9)

How does all this bind together?

Open Harvest, Chania, 3 June 2017

(10)

This is where the footer goes

OpenMinTeD Scientific/Business Applications

Data & content

repositories/registries

For Data analysts, Scientists Curation tools

For Data providers Analytical tools

For Scientists-policy makers Decision tools

For Farmers, SMEs

Knowledge acquisition tools For Scientists, Ontologists

Open Harvest, Chania, 3 June 2017

(11)

Example: WheatIS / gnpIS

Application: federated search of genomic and genetic data for wheat

Added value: direct access from data to related scientific articles

Objects: taxa, genes, markers, phenotypes and varieties

Challenges: naming heterogeneity &

scale variety between textual and experimental data

Open Harvest, Chania, 3 June 2017

(12)

data - text interoperability

experimental data Knowledge

textual data

Open Harvest, Chania, 3 June 2017

semantic interop.

pheno

web services

bib

>18,000 articles 300,000 genes 30,000 markers 220 taxa

208 phenotypes

(13)

Example: living conditions of food-related micro-organisms

Application: study and characterize the microbial biodiversity of food ecosystems (dairy or meat products, fish, wine, etc.)

- risk management

- food quality improvement

Added value: completion of knowledge databases with info from the literature

Objects: bacteria, habitats and phenotypes

Challenges: heterogeneous data integration, object identity

Open Harvest, Chania, 3 June 2017

(14)

Open Harvest, Chania, 3 June 2017

OpenMinTeD for agriculture

Data/Content Semantics

My decision app

End-user

GACS landvoc Agrovoc PO...

(15)

Text mining

- creates value from texts

- creates even more value if the results are linked to other data

This relies on shared semantics, standards & protocols This requires less tech competencies & resources thanks to the common e-infra

Open Harvest, Chania, 3 June 2017

(16)

twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin

vimeo.com/openminted bit.do/openmintedplus

THANK YOU!

Sophie Aubin sophie.aubin@inra.fr

twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin

vimeo.com/openminted bit.do/openmintedplus15

Références

Documents relatifs

Given a manually annotated sentence set S l which can be used as training data, a set of unannotated sentences S ul , a vocabulary V including words in the domain of the testing data

Open science makes scientific objects visible, accessible, reusable and linked. Introduction Open science needs linked

In this work, we construct an example of this type; more precisely, by solving a quasilinear elliptic boundary-value prob- lem we define a nonlinear operator on the space of

Cur- rently, NLPHub integrates 30 state-of-the-art text mining services and methods to recognize frag- ments of a text (annotations) associated with named abstract or physical

Nev- ertheless, Big Data technologies and massive clustering techniques are now possible so that the release of a TREC style collection, that is still missing, will help the IR and

In response to the above, text mining (TM) aims to automate the above process, by finding relations (such as interactions) that hold between concepts of different types

• Want to view a page of a document but see context of surrounding text and

− Understand the contents of a document or collection of documents without reading them.?. Spring 2004 CS