• Aucun résultat trouvé

December2,2014 AntoineAmarilli TechnologiesduWebMasterCOMASICInformationExtractionandtheSemanticWeb

N/A
N/A
Protected

Academic year: 2022

Partager "December2,2014 AntoineAmarilli TechnologiesduWebMasterCOMASICInformationExtractionandtheSemanticWeb"

Copied!
35
0
0

Texte intégral

(1)

Technologies du Web Master COMASIC

Information Extraction and the Semantic Web

Antoine Amarilli 1

December 2, 2014

1 Course material adapted from Fabian Suchanek’s slides:

http://suchanek.name/work/teaching/IESW2010.pdf.

SPARQL example from:

(2)

Motivation

We have seen how search engines work at the level of words.

Sometimes, this works...

Sometimes, it doesn’t:

Those hard queries would be easy on RDBMSes!

We need to extract structured information.

We would like to understand its semantics.

(3)

Other motivations: job offerings Motivation: Examples

Title Type Location

Business strategy Associate Part time Palo Alto, CA

Registered Nurse Full time Los Angeles

(4)

Other motivations: scientific papers

Motivation: Examples

Author Publication Year Grishman Information Extraction... 2006

... ... ... 9

(5)

Other motivations: price comparison

Motivation: Examples

Product Type Price

Dynex 32” LCD TV $1000

(6)

Table of contents

1 Introduction

2 Information Extraction

3 Semantic Web

(7)

Roadmap Information Extraction

Instance Extraction

Fact Extraction

Ontological Information Extraction

and beyond Information Extraction (IE) is the process

of extracting structured information (e.g., database tables) from unstructured machine-readable documents

(e.g., Web documents).

Person Nationality Elvis Presley American

nationality

Elvis Presley singer

Angela Merkel politician

(8)

Instance extraction with Hearst patterns

Entities can be extracted and categorized by automatic extraction of simple patterns:

Many scientists, including Einstein, believed...

France, Germany and other countries have been plagued with...

Other forms of government such as constitutional monarchy...

Difficulties:

Must parse correctly.

Must be resilient to noise.

(9)

Instance extraction with set expansion

Start with a seed set of entities of a certain type.

Find occurrences of them at specific position in documents:

Lists.

Table columns.

Assume that other items are other entities of the same nature.

Once again, this is noisy...

Precision and recall, see previous slides.

(10)

Set expansion example Instance Extraction: Set Expansion

Seed set: {Russia, USA, Australia}

Result set: {Russia, Canada, China, USA, Brazil, Australia, India, Argentina,Kazakhstan, Sudan}

17

(11)

Fact extraction with wrapper induction Fact Extraction: Wrapper Induction

Observation: On Web pages of a certain domain, the information is

often in the same spot.

(12)

Specifying a wrapper

A wrapper can be expressed:

As a path in the DOM (usually XPath).

Extensions to multiple pages, e.g., OXPath.

As a regular expression.

A wrapper can be produced:

Through manual annotation of the relevant fields.

Using specific knowledge of the source.

Wikipedia categories and infoboxes.

By comparison between similar pages to find what changed.

Using seed pairs (known facts).

Possibility to iterate between patterns and facts.

Risk of semantic drift.

(13)

Fact extraction on text

Entity extraction:

find entities in the text.

Named entity recognition:

identify the type of entities.

person

organization

quantity

address

etc.

NLP patterns to extract facts

POS patterns

Parse trees.

(14)

Fact extraction on text Fact Extraction: Pattern Matching

Einstein ha scoperto il K68, quando aveva 4 anni.

Bohr ha scoperto il K69 nel anno 1960.

Person Discovery

Einstein K68

X ha scoperto il Y

Person Discovery

Bohr K69

The patterns can be more complex, e.g.

•  regular expressions

X discovered the .{0,20} Y

•  POS patterns

X discovered the ADJ? Y

•  Parse trees

X discovered Y PN

NP S VP

V PN

NP

34

Try

(15)

Ontologies

Ontology: a set of entities and relations.

Name Ment. Mfact Domain Publisher Start Update

YAGO >10 >120 general MPI 2008 2012

DBPedia 2 4.6 3 general OpenLink et al 2007 2014

Wikidata 3 15 30 general Wikimedia 2012 2014

Freebase 4 46 2 680 general Metaweb (Google) 2007 2014 Knowl. Vault 5 >570? >1 600 general Google 2014 2014 MusicBrainz 6 >35 >180 music MetaBrainz 2003 2013

WordNet 0.4 2 English Princeton 1985 2006

ConceptNet 7 5.3 13 obvious MIT 2000 2014

2 http://blog.dbpedia.org/2014/09/09/dbpedia-version-2014-released/

3 http://cacm.acm.org/magazines/2014/10/178785-wikidata/fulltext#body-9 4 http://freebase.com/

5 http://www.newscientist.com/article/mg22329832.

(16)

Introduction Information Extraction Semantic Web

Entity disambiguation

Map mentions in text to entities.

Problem: mentions are ambiguous!

Use the importance of entities.

Use the likelihood that a term refers to an entity.

Use semantic consistency on the mappings of a document.

(17)

Entity disambiguation

Map mentions in text to entities.

Problem: mentions are ambiguous!

Use the importance of entities.

Use the likelihood that a term refers to an entity.

Use semantic consistency on the mappings of a document.

(Demo: https://gate.d5.mpi-inf.mpg.de/webaida/)

(18)

Ontological IE

Use the existing ontology as reference.

Extract information from additional documents.

Use fuzzy rules to extend the ontology (often manual):

Extraction rules

Logical constraints

Common sense Reasoning:

Datalog

Weighted MAX-SAT

Markov Logic Networks

(19)

Ontological IE

Ontology

NL documents

Elvis was born in 1935

Semantic Rules

birthdate<deathdate

type(Elvis_Presley,singer) subclassof(singer,person) ...

appears(“Elvis”,”was born in”, ”1935”)

...

means(“Elvis”,Elvis_Presley,0.8) means(“Elvis”,Elvis_Costello,0.2) ...

born(X,Y) & died(X,Z) => Y<Z appears(A,P,B) & R(A,B) => expresses(P,R) appears(A,P,B) & expresses(P,R) => R(A,B)

...

First Order Logic Formulae

1935 born New facts with 1. canonic relations 2. canonic entities 3. consistency with the ontology

Ontological IE: Reasoning

SOFIE

system

(20)

Introduction Information Extraction Semantic Web

Open IE

Use the entire Web as corpus.

Crawl the Web for new facts.

Create new rules from what is extracted.

Examples:

Open IE, University of Washington.

Read the Web, CMU.

Demo: http://rtw.ml.cmu.edu/rtw/

(21)

Introduction Information Extraction Semantic Web

Open IE

Use the entire Web as corpus.

Crawl the Web for new facts.

Create new rules from what is extracted.

Examples:

Open IE, University of Washington.

Demo: http://openie.cs.washington.edu/

Demo: http://rtw.ml.cmu.edu/rtw/

(22)

Open IE

Use the entire Web as corpus.

Crawl the Web for new facts.

Create new rules from what is extracted.

Examples:

Open IE, University of Washington.

Demo: http://openie.cs.washington.edu/

Read the Web, CMU.

Demo: http://rtw.ml.cmu.edu/rtw/

(23)

Table of contents

1 Introduction

2 Information Extraction

3 Semantic Web

(24)

Motivation

Having structured data is nice.

However, independent sources are not useful.

We need to create links between data sources.

Run a query across multiple relevant data stores.

Perform complex transactions (booking a flight, a hotel...).

Rich data visualization (integrating e.g. maps and statistics).

We need to define the semantics of the data.

We need to enforce constraints.

We need to evaluate complex queries over multiple sources.

(25)

The Semantic Web

URIs Globally unique identification of entities and relations.

OWL Constraint language over structured data.

RDF Storage format for structured data.

SPARQL Query language for structured data.

LOD Linked Open Data: draw links between data sources.

(26)

URIs

Uniform Resource Identifier Like URLs.

Not always dereferenceable.

URNs: urn:isbn:0486415864 URLs, often with namespaces:

dbp:Paris for http://dbpedia.org/resource/Paris

(27)

RDF

Resource Description Framework.

Triples: Subject predicate object.

<dbp:Paris> <dbp:country> <dbp:France>

The country of Paris (DBPedia resources).

<dbp:Paris> <foaf:homepage> <http://www.paris.fr>

The homepage of Paris (FOAF relation, website).

<dbp:Paris> <foaf:name> "Paris"@en

The name of Paris (FOAF relation, literal value).

Multiple serializations.

(28)

RDFS

RDF Schema.

<dbp:Paris> <rdf:type> <dbp:Settlement>

Paris is a settlement.

<dbp:Settlement> <rdfs:subclassOf> <dbp:Place>

If I am a Settlement then I am a Place.

<dbp:writer> <rdfs:subPropertyOf> <y:created>

If you are the writer of something (for DBPedia)

then you are the creator of that thing (for YAGO).

(29)

OWL

Ontology Web Language.

<dbp:birthPlace> <rdf:type> <owl:FunctionalProperty>

People are born in at most one place.

<dbp:Person> <owl:disjointWith> <dbp:Settlement>

Something cannot be both a Person and a Settlement.

<schema:spouse> <owl:equivalentProperty> <dbp:spouse>

spouse in Schema.org in DBPedia are equivalent properties.

<myonto:p4242> <owl:sameAs> <dbp:Douglas_Adams>

Assert equalities between resources.

(30)

Introduction Information Extraction Semantic Web

SPARQL

SPARQL Protocol And RDF Query Language.

Query language for RDF.

PREFIX abc: <http://example.com/exampleOntology#>

SELECT ?capital ?country WHERE {

?x abc:cityname ?capital ; abc:isCapitalOf ?y .

?y abc:countryname ?country ;

abc:isInContinent abc:Africa .

}

(31)

SPARQL

SPARQL Protocol And RDF Query Language.

Query language for RDF.

PREFIX abc: <http://example.com/exampleOntology#>

SELECT ?capital ?country WHERE {

?x abc:cityname ?capital ; abc:isCapitalOf ?y .

?y abc:countryname ?country ; abc:isInContinent abc:Africa . }

(Demo: http://dbpedia-live.openlinksw.com/sparql/)

(32)

Linked Open Data

Linked Datasets as of August 2014 Uniprot

Alexandria Digital Library Gazetteer

lobid Organizations

chem2 bio2rdf

Multimedia Lab UniversityGhent

Open Data Ecuador Geo Ecuador

Serendipity UTPL LOD

GovAgriBus Denmark

DBpedia

live URI

Burner

Linguistics Social Networking Life Sciences Cross-Domain

Government

User-Generated Content Publications

Geographic

Media Identifiers

Eionet RDF

lobid Resources Wiktionary

DBpedia

Viaf Umthes

RKB Explorer Courseware

Opencyc

Olia Gem.

Thesaurus Audiovisuele Archieven

Diseasome FU-Berlin Eurovoc

in SKOS

DNB GND Cornetto

Bio2RDF Pubmed Bio2RDF

NDC Bio2RDF

Mesh IDS

Ontos News Portal AEMET

ineverycrea Linked User Feedback

Museos Espania GNOSS

Europeana

Nomenclator Asturias

Red Uno Internacional GNOSS

Geo Wordnet

Bio2RDF HGNC Ctic

Public Dataset

Bio2RDF Homologene Bio2RDF Affymetrix

Muninn World War I

CKAN

Government Web IntegrationLinkedfor Data

Universidad de Cuenca Linkeddata

Freebase

Linklion

Ariadne

Organic Edunet

Gene Expression Atlas RDF

Chembl RDF

Biosamples RDF Identifiers

Org Biomodels

RDF

Reactome RDF Disgenet

Semantic Quran

IATI as Linked Data

Dutch Ships and Sailors Verrijktkoninkrijk

IServe Arago-

dbpedia

Linked TCGA ABS

270a.info

RDF License EnvironmentalApplications

Reference Thesaurus

Thist

JudaicaLink

BPR

OCD

Shoah Victims Names Reload

Data for Tourists in Castilla y Leon

2001 Spanish Census to RDF

RKB Explorer Webscience

RKB Explorer Eprints Harvest

NVS EU Agencies

Bodies EPO LinkedNUTS

RKB Explorer Epsrc

Open Mobile Network

RKB Explorer Lisbon

RKB Explorer Italy CE4R

EnvironmentAgency Bathing WaterQuality

RKB Explorer Kaunas

Open Data Thesaurus RKB Explorer Wordnet

RKB Explorer ECS Austrian

Ski Racers Social-

semweb Thesaurus

Data Open Ac Uk

RKB Explorer IEEE

ExplorerRKB LAAS

RKB Explorer Wiki RKB Explorer JISC

RKB Explorer Eprints

RKB Explorer Pisa

RKB Explorer Darmstadt RKB Explorer unlocode

RKB Explorer Newcastle

RKB Explorer OS

RKB Explorer Curriculum

RKB Explorer Resex

RKB Explorer Roma

RKB Explorer Eurecom RKB Explorer IBM RKB

Explorer NSF

RKB Explorerkisti RKB Explorer DBLP RKB ExplorerACM

RKB Explorer Citeseer

RKB Explorer Southampton RKB Explorer Deepblue

RKB Explorer Deploy RKB Explorer Risks RKB ExplorerERA

RKB Explorer OAI

RKB ExplorerFT RKB Explorer Ulm

RKB Explorer Irit

RKB Explorer RAE2001

RKB Explorer Dotac

RKB Explorer Budapest

Swedish Open Cultural Heritage Radatana Courts

Thesaurus

German Labor Law Thesaurus

GovUK TransportData

GovUK Education Data

Enakting Mortality Enakting

Energy

Enakting Crime

Enakting Population Enakting CO2Emission EnaktingNHS

RKB Explorer Crime

RKB Explorer cordis

Govtrack

Geological Survey of Austria Thesaurus

Geo Linked Data

Gesis Thesoz

Bio2RDF Pharmgkb

Bio2RDF Sabiork Bio2RDF

Ncbigene Bio2RDF Irefindex

Bio2RDF Iproclass

Bio2RDF GOA Bio2RDF Drugbank

Bio2RDF CTD

Bio2RDF Biomodels Bio2RDF DBSNP

Bio2RDF Clinicaltrials

Bio2RDF LSR Bio2RDF Orphanet

Bio2RDF Wormbase BIS

270a.info

DM2E

DBpedia PT DBpedia

ES

DBpedia CS DBnary

Alpino RDF YAGO

Pdev Lemon Lemonuby

Isocat Ietflang

Core

KUPKB

Getty AAT Semantic Web Journal OpenlinkSW

Dataspaces MyOpenlink Dataspaces Jugem

Typepad

Aspire Harper Adams

NBN Resolving Worldcat

Bio2RDF

Bio2RDF ECO Taxon- concept Assets Indymedia

GovUK Societal Wellbeing Deprivation imdEmployment Rank La 2010

GNU Licenses Greek Wordnet

DBpedia

CIPFA

Yso.fi Allars

Glottolog

StatusNet Bonifaz StatusNet

shnoulle

Revyu

StatusNet Kathryl Charging

Stations

Aspire UCL

Tekord Didactalia

Artenue Vosmedios GNOSS

Linked Crunchbase

ESD Standards

VIVO University of Florida

Bio2RDF SGD Resources

Product Ontology

Datos Bne.es

StatusNet Mrblog

Bio2RDF Dataset EUNIS

GovUK Housing Market

LCSH GovUK

TransparencyImpact ind.

HouseholdsIn temp.Accom.

Uniprot KB StatusNet

Timttmy

SemanticWeb Grundlagen

GovUK Input ind.

Local Authority Funding From GovernmentGrant

StatusNet Fcestrada

JITA

StatusNet Somsants

StatusNet Ilikefreedom

Drugbank FU-Berlin Semanlink

StatusNet Dtdns

StatusNet Status.net

DCS Sheffield Athelia

RFID

StatusNet Tekk

Lista Encabeza Mientos Materia

StatusNet Fragdev

Morelab DBTune

John Peel Sessions

RDFize last.fm

Open Data Euskadi

GovUK Transparency Input ind.

Local auth.

Funding f.

Gvmnt. Grant

MSC Lexinfo

StatusNet Equestriarp

Asn.us GovUK

Societal Wellbeing Deprivation Imd Health Rank la2010

StatusNet Macno Oceandrilling

Borehole

Aspire Qmul

GovUK Impact Indicators Planning Applications Granted

Loius

Datahub.io

StatusNet Maymay

Prospects and Trends GNOSS

GovUK Transparency Impact Indicators Energy Efficiencynew Builds

DBpedia EU

Bio2RDF Taxon

StatusNet Tschlotfeldt Jamendo DBTune

Aspire NTU

GovUK Societal Wellbeing Deprivation Imd Health Score2010

Lotico

GNOSS

Uniprot Metadata Linked

Eurostat

Aspire Sussex Lexvo

Linked Geo Data

StatusNet Spip

SORS

GovUK Homeless-ness Accept. per 1000

TWC IEEEvis

Aspire Brunel

PlanetData Project Wiki

StatusNet Freelish Statistics

data.gov.uk

StatusNet Mulestable

Enipedia UK

Legislation API

Linked MDB

StatusNet Qth

Sider FU-Berlin

DBpedia DE GovUK

Households Social lettings General NeedsLettings Prp Number Bedrooms

Agrovoc Skos My Experiment Proyecto

Apadrina GovUK

Imd Crime Rank 2010

SISVU GovUK

Societal Wellbeing Deprivation Imd Housing Rank la2010

StatusNet Uni Siegen Opendata

Scotland SimdEducation Rank

StatusNet Kaimi GovUK

Households Accommodatedper 1000

StatusNet Planetlibre

DBpedia EL

Sztaki LOD

DBpedia Lite

Drug Interaction Knowledge Base StatusNet

Qdnx

Amsterdam Museum AS EDN LOD

RDF Ohloh DBTune

artists last.fm

Aspire Uclan Hellenic

Fire Brigade

Bibsonomy

Nottingham Trent ResourceLists Opendata

Scotland Simd Income Rank

Randomness Guide London Opendata Scotland Simd Health Rank

SouthamptonECS Eprints

FRB 270a.info

StatusNet Sebseb01 StatusNet

Bka ESD

Toolkit

Hellenic Police

StatusNet Ced117 Open

Energy Info Wiki

StatusNet Lydiastench Open Data RISP

Taxon- concept Occurences

Bio2RDF SGD UIS

270a.info

NYTimes Linked Open Data

Aspire Keele

GovUK Households Projections Population

W3C Opendata

Scotland Simd HousingRank

ZDB

StatusNet 1w6

StatusNet Alexandre Franke

Dewey Decimal Classification

StatusNet Status StatusNet

doomicile Currency

Designators

StatusNet Hiico

Linked Edgar

GovUK Households 2008

DOI

StatusNet Pandaid Brazilian

Politicians

NHS Jargon

Theses.fr

Linked Life Data Semantic WebDogFood

UMBEL Openly

Local

StatusNet Ssweeny

Linked Food Interactive

Maps GNOSS

OECD 270a.info

Sudoc.fr Green

Competitive-ness GNOSS

StatusNet Integralblue

WOLD

Linked Stock Index

Apache

KDATA Linked

Open Piracy GovUK

Societal Wellbeing Deprv. Imd Empl. RankLa 2010

BBC Music

StatusNet Quitter

StatusNet Scoffoni Open

ElectionData Project

Reference data.gov.uk

StatusNet Jonkman

Project Gutenberg FU-Berlin DBTropes

StatusNet Spraci

Libris

ECB 270a.info

StatusNet Thelovebug Icane

Greek Administrative Geography

Bio2RDF OMIM StatusNet

Orangeseeds

National Diet Library WEB NDL Authorities

Uniprot Taxonomy DBpedia

NL

L3S DBLP FAO Geopolitical Ontology

GovUK Impact Indicators Housing Starts

Deutsche Biographie

StatusNet ldnfai StatusNet

Keuser

StatusNet Russwurm GovUK SocietalWellbeing

Deprivation Imd Crime Rank 2010

GovUK Imd Income Rank La 2010

StatusNet Datenfahrt StatusNet Imirhil

Southampton ac.uk LOD2 Project Wiki

DBpedia KO

Dailymed FU-Berlin WALS

DBpedia IT

StatusNet Recit

Livejournal StatusNet

Exdc Elviajero

Aves3D Open

Calais Zaragoza

Turruta

Aspire Manchester Wordnet

(VU)

GovUK Transparency Impact Indicators NeighbourhoodPlans

StatusNet David Haberthuer

B3Kat Pub Bielefeld Prefix.cc

NALT Vulnera-

pedia

GovUK Impact Indicators Affordable Housing Starts

GovUK Wellbeing lsoaHappy YesterdayMean

Flickr Wrappr

Yso.fi YSA

Open Library

Aspire Plymouth

StatusNet Johndrink Water StatusNet Gomertronic

Tags2con Delicious

StatusNet tl1n

StatusNet Progval

Testee

World Factbook FU-Berlin

DBpedia JA

StatusNet Cooleysekula

Product DB IMF

270a.info

StatusNet Postblue

StatusNet Skilledtests Nextweb GNOSS

Eurostat FU-Berlin GovUK Households Social Lettings General Needs Lettings PrpHousehold Composition

StatusNet Fcac

DWS Group OpendataScotland

Graph Simd Rank

DNB

Clean Energy Data Reegle

Opendata Scotland Simd Employment Rank

Chronicling America

GovUK Societal Wellbeing Deprivation Imd Rank 2010

StatusNet Belfalas

Aspire MMU

StatusNet Legadolibre

Bluk BNB

StatusNet Lebsanft

GADM Geovocab

GovUK Imd Score2010

Semantic XBRL

UK Postcodes

Geo Names

EEARod

Aspire Roehampton BFS

270a.info

Camera Deputati Linked Data

Bio2RDF GeneID GovUK

Transparency Impact IndicatorsApplicationsPlanning Granted

StatusNet Sweetie Belle

O'Reilly

GNI City

Lichfield

GovUK Imd Rank 2010

Bible Ontology Idref.fr

StatusNet Atari Frosch

Dev8d

Nobel Prizes StatusNet

Soucy

Archiveshub Linked Data

Linked Railway Data Project FAO 270a.info GovUK

Wellbeing Worthwhile Mean

Bibbase Semantic-

web.org

British Museum Collection

GovUK Dev Local Authority Services

Code Haus

Lingvoj

OrdnanceSurvey Linked Data

Wordpress

Eurostat RDF

StatusNet Kenzoid GEMET

GovUK Societal Wellbeing Deprv. imd Score '10

Mis Museos GNOSS

GovUK Households Projectionstotal Houseolds

StatusNet 20100 EEA

Ciard Ring

Opendata Scotland GraphEducationPupils bySchool and

Datazone

VIVO Indiana University Pokepedia

Transparency 270a.info

StatusNet Glou GovUK

Homelessness Households AccommodatedTemporary Housing Types

STW Thesaurus for Economics Debian

Package Tracking System

DBTune Magnatune

NUTS Geo- vocab GovUK

Societal Wellbeing Deprivation Imd Income Rank La2010

BBC Wildlife Finder

StatusNet Mystatus

Miguiad Eviajes GNOSS Acorn

Sat

Data Bnf.fr GovUK

imd env.

rank 2010

StatusNet Opensimchat

Open Food Facts GovUK

Societal Wellbeing Deprivation Imd Education Rank La2010

LOD ACBDLS FOAF-

Profiles

StatusNet Samnoble GovUK

Transparency Impact IndicatorsAffordableHousing Starts

StatusNet Coreyavis Enel Shops

DBpedia FR

StatusNet Rainbowdash

StatusNet Mamalibre

Princeton Library Findingaids

WWW Foundation

Bio2RDFOMIM Resources Opendata

Scotland Simd Geographic Access Rank

Gutenberg

StatusNet Otbm ODCL

SOA

StatusNet Ourcoffs

Colinda Web

Nmasuno Traveler

StatusNet Hackerposse

LOV Garnica Plywood

GovUK wellb. happy yesterday std. dev.

StatusNet Ludost

BBC Program- mes GovUK

Societal Wellbeing Deprivation ImdEnvironmentRank 2010

Bio2RDF Taxonomy Worldbank

270a.info

OSM

DBTune Music- brainz

Linked Mark Mail

StatusNet Deuxpi GovUK

TransparencyIndicatorsImpact Housing Starts

Bizkai Sense GovUK impact indicators energyefficiency new builds

StatusNet Morphtown GovUK

Transparency Input indicators Local authoritiesWorking w. tr.Families

ISO 639 Oasis

Aspire Portsmouth Zaragoza

Datos Abiertos Opendata ScotlandSimd Crime Rank

Berlios

StatusNet piana GovUK

Net Add.

Dwellings

Bootsnall

StatusNet chromic

Geospecies

linkedct Wordnet (W3C)

StatusNet thornton2 StatusNet

mkuttner

StatusNet linuxwrangling Eurostat

Linked Data

GovUK societal wellbeing deprv. imdrank '07 GovUK societal wellbeing deprv. imd rank la '10

Linked Open Data of Ecology

StatusNet chickenkiller

StatusNet gegeweb

Deusto Tech

StatusNet schiessle GovUK

transparencyimpact indicators tr. families

Taxon concept

GovUK service expenditure GovUK societal wellbeing deprivation imdemploymentscore 2010

(33)

Linked Open Data (zoom)

Opencyc

Freebase YAGO

KUPKB

DBpedia

EUNIS

Drugbank FU-Berlin UMBEL

DBpedia StatusNet NL

Progval

Archiveshub Linked

Data

Code

FOAF- Profiles

LOV

Geospecies Linked

Open Data of Ecology

Taxon

concept

(34)

Linked Open Data

Integrate many sources:

General ontologies.

Domain-specific ontologies.

Open data dumps.

Existing relational databases.

Find links

Automatically.

By hand (relations, rules...).

Statistics:

Hundreds of ontologies.

504 million RDF links (2011).

52 billion triples (2012). 8

5 billion entities (2012, incl., e.g., 854 million people).

Wealth of data, still underused

8 http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/

(35)

Challenges

Manage trust and attribution.

Manage time.

Manage uncertainty throughout the pipeline.

Extraction.

Trust.

Intrisic uncertainty.

Manage complex facts.

Reification.

Manage noise.

Compute alignments.

Références

Documents relatifs

Finite model theory which is the math behind relational databases. Natural language processing for

Modèle OSI Couches basses Couches hautes HTTP Headers client Headers serveur Autres aspects.. Technologies du Web

Généralités Structure Texte, listes et tableaux Multimédia Formulaires.. Technologies du Web Master COMASIC Le

visible Le contenu en trop est visible mais la taille de l’élément ne change pas pour parents et voisins (défaut). hidden Le contenu en trop

Possibilité de faire tourner des applications Java dans une zone de taille fixe dans une page HTML, depuis 1995. Bac à sable : par défaut, l’application ne peut pas faire

Historical approach: index multimedia content using text clues: surrounding text, file names, alt text, etc. → http://google.com/search?tbm=isch&amp;q=241543903 Better: index

⇒ Émulation non-standardisée des défauts d’interprétations de certains anciens navigateurs..

Autres fonctionnalités Inclusion de contenu Media