• Aucun résultat trouvé

Semantic Web MPRI 2.26.2: Web Data Management

N/A
N/A
Protected

Academic year: 2022

Partager "Semantic Web MPRI 2.26.2: Web Data Management"

Copied!
32
0
0

Texte intégral

(1)

Semantic Web

MPRI 2.26.2: Web Data Management

Antoine Amarilli Friday, January 11th

(2)

Motivation

Information on the Web is notstructured

This makes it difficult to:

Combine informationfrom multiple sources

Integratedifferent services

Reasonwith Web data

(3)

Motivation

Information on the Web is notstructured

This makes it difficult to:

Combine informationfrom multiple sources

Integratedifferent services

Reasonwith Web data

(4)

Goal

What if we could write:

SELECT ?book ?bookLabel WHERE {

?book wdt:P166/(wdt:P361*|wdt:P31) wd:Q194285, wd:Q188914 .

?book wdt:P50/wdt:P21 wd:Q6581072 . SERVICE wikibase:label { bd:serviceParam

wikibase:language "[AUTO_LANGUAGE],en". } }

(5)

Goal

What if we could write:

SELECT ?book ?bookLabel WHERE {

?book wdt:P166/(wdt:P361*|wdt:P31) wd:Q194285, wd:Q188914 .

?book wdt:P50/wdt:P21 wd:Q6581072 . SERVICE wikibase:label { bd:serviceParam

wikibase:language "[AUTO_LANGUAGE],en". } }

(6)

Applications

Most visible applicationtoday: rich search results

Binghttps://www.bing.com/webmaster/help/

marking-up-your-site-with-structured-data-3a93e731

Google Searchhttps://developers.google.com/search/docs/

guides/search-gallery

Yandex Searchhttps://yandex.ru/support/webmaster/

site-content/data-transmit.html?ncrnd=4299&lang=en

Also:SPARQL endpoints, e.g.,https://query.wikidata.org/

Indirectly:Graph databases

(7)

Key concepts

Semantic Web:putstructured dataon the Web

Entities:the things about which we want to express data

Ontology:avocabularyto talk about entities, specifying relations,classes, etc.

Knowledge base:a set of assertions about entities expressed following an ontology

Linked data:use URIs to createlinksbetween datasets

Information extraction:creating structured information out of existing Web Data (later)

(8)

Resources (or entities)

A resource is anything that can be referred to by aURI

• aweb page, identified by a URL

• afragment of an XML document, identified by an element node of the document,

• aweb service,

• anidentifier, e.g., an ISBN,

• a thing, an object, a concept, a property, etc.

The URI does not need to bedereferenceable

Acool URIis one that can be dereferenced to obtain information about the entity

https://www.wikidata.org/wiki/Q42

https://wiki.openstreetmap.org/wiki/Key:opening_hours

(9)

Ontologies

Formal descriptions providinghumanusers a shared understanding of a given domain

• A controlled vocabulary

Formally defined so that it can also be processed bymachines

Logical semanticsthat can be useful forreasoning

• to answer queries (over possibly distributed data)

• to relate objects in different data sources and integrate them

• to detect inconsistencies or redundancies

• to refine queries with too many answers

• to relax queries with no answer

(10)

Where Do Ontologies Come From?

Manually craftedto represent the knowledge of a specific domain (e.g., life sciences)

Exported fromclassical Web databases

Createdcollaboratively(e.g., Wikidata)

Privateto a company orpublic

Value of the Semantic Web: bits of ontologies can bere-usedin another, and ontologies can be mapped withowl:sameAs, owl:equivalentClass,owl:equivalentProperty, etc.

(11)

Classes and class hierarchy

Aclassdenotes a set of entities; e.g.,Professor,Country,Person

An entity can be aninstanceof one or several classes

Ex: MPRIinstanceOfMasterProgram

Ex: AcademicStaffsubClassOfStaff (interpreted as set inclusion)

FacultyComponent

Student

UndergraduateStudent MasterStudent

PhDStudent Department

PhysicsDept MathsDept CSDept Staff

AcademicStaff Lecturer Researcher Professor AdministrativeStaff

(12)

Relations

Arelationdenotes a binary relation between objects; e.g., locatedIn,father

Often convenient to express with asignature(classes for the subject and object)

TeachesIn(AcademicStaff, Course)

• if one states that “XTeachesInY”, thenXbelongs to AcademicStaff andYto Course

TeachesTo(AcademicStaff, Student)

Leads(Staff, Department)

(13)

Namespaces

Using full URLs for URIs is oftencumbersome

https://www.wikidata.org/entity/Q42

Idea:declare anamespacelikewdto stand for https://www.wikidata.org/entity/

Then you can write aCompact URI(aka CURIE)wd:Q42to mean https://www.wikidata.org/entity/Q42

This is like a simplification ofXML namespaces

Also adefault namespace, to write:Q42

In the slides I will just write entities like “MPRI” and ignore the issue

(14)

Ontologies and knowledge bases

To summarize:

Ontologyputs the focus on theschema, or TBox

• The set of class and relation names (= thevocabulary)

• Thesignaturesof relations and alsoconstraints

• The constraints can be used to:

• check data consistency (like dependencies in databases)

• infer new facts

Knowledge baseputs the focus on theinstance, or ABox

Entities(instances of a class), andfactsabout them (see next)

Manyknowledge basesprovide their ownontology(but may also use terms from other ontologies)

(15)

Ontology Languages for the Web

(16)

3 ontology languages for the Web

RDF:not really an ontology language (only ABox facts)

RDFS:schema for RDF, but very basic

OWL:a much richer ontology language

(17)

RDF: Resource Description Framework

RDF facts aretriplets

Each triplet is of the form:hSubject Predicate Objecti

The subject is a URI, referencing anentity (from the same KB or a different one)

The predicate is a URI, referencing arelation (from some ontology)

The object is either a URI, referencing anentity, or aliteral

h:Dupond:Leads:CSDepti

h:Dupond:HasNamePaul Dupondi

h:Dupond:TeachesIn:UE111i

h:Dupond:TeachesTo:Pierrei

h:Pierre:EnrolledIn:CSDepti

h:Pierre:RegisteredTo:UE111i

h:UE111:OfferedBy:CSDepti

(18)

RDF graph

A set of RDF facts defines

• a set of relations between objects

• anRDF graphwhere the nodes are objects:

:TeachesIn

:Dupond :Pierre

:InfoDept

UE111 :TeachesTo

:EnrolledIn

:RegisteredTo

:OfferedBy :Leads

(19)

RDF semantics

A tripleths P oiis interpreted in first-order logic (FOL) as a fact P(s,o)

Example:

Leads(Dupond,CSDept) TeachesIn(Dupond,UE111) TeachesTo(Dupond,Pierre) EnrolledIn(Pierre,CSDept) RegisteredTo(Pierre,UE111) OfferedBy(UE111,CSDept)

(20)

Literals

Sometimes theobjectof a statement is a literal value, e.g., a name, number, date

Literals can come with alanguage, i.e., an ISO language name

"France"@en

They can have adata type, which is just a URI

"28753"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>

"6.96E10"^^<http://dbpedia.org/datatype/euro>

Default data type, and default for literals with a language code

The data type also indicates how the value isinterpreted, how comparisonswork, etc.

(21)

RDF Serialization

Several serialization formats for RDF data, see alternate formats of

http://live.dbpedia.org/page/%C3%89lectricit%C3%A9_de_France:

RDF/XML, structured XML representation, allowing for nesting

N-Triples,Turtle,N3, text-based formats

RDFaandJSON-LDto integrate RDF annotations into HTML

(22)

N-Triples

Simplest text-based format

Eachfactis a triple of a subject, predicate, object, followed by a full stop.

<http://live.dbpedia.org/resource/\u00C9lectricit\u00E9_de_France>

<http://live.dbpedia.org/property/netIncome>

"3.2E9"^^<http://dbpedia.org/datatype/euro> .

Possibility ofcomments

Support forliteralswithlanguageanddatatype

All usual questions of quoting, escaping, character encodings...

(23)

Turtle

Turtleis a superset of N-Triples

Addsprefixes(for CURIEs) anddefault prefix

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

Addsfactoringof common subjects, common predicates :s1 :p1 :o1, :o2, :o3 ;

:p2 :o4 . :s2 :p3 :o5 .

Addsaalias for the type relation

(24)

Turtle (cont’d) and N3

Addssquare bracketsfor blank nodes (see later) ex:coursenotes ex:author [

foaf:name "Amarilli" ; foaf:givenname "Antoine"

] .

# stands for

ex:coursenotes ex:author _:b1 . _:b1 foaf:name "Amarilli" ;

foaf:givenname "Antoine" .

Addsbracketsforlinked lists

Notation3 (N3)is a superset of Turtle with additional features for semantic assertions (not just RDF data).

(25)

Blank nodes

Some entities may beblank nodes, i.e., nodes with no URI.

They are written_:xyzwithxyzbeing alocal identifier

Common usage:n-ary relations

ex:speaker ex:gaveSeminar _:seminar . _:seminar ex:date "2019-01-11"^^xsd:date . _:seminar ex:room ex:Room101 .

_:seminar ex:title "Example seminar title"@en .

(26)

Reification

Another common usage of blank nodes isreification:

dbp:Earth dbprop:creationDate "-4003-10-23"^^xsd:date .

# can be written as

_:stmt rdf:type rdf:Statement ; rdf:subject dbp:Earth ;

rdf:predicate dbprop:creationDate ; rdf:object "-4003-10-23"^^xsd:date .

# allowing us to say, e.g.,

_:stmt dbp:author dbp:James_Ussher .

(27)

RDF built-ins

Someclasses, i.e., literals, properties, statements, etc.

Somedatatypes, i.e., unordered lists, ordered lists, bags,

rdf:typeto say that something is an instance of a class

→ To say more about theschema, we needRDF Schema

(28)

RDF Schema (RDFS)

RDF Schemais a language to describe the schema of RDF documents

Do net get confused: RDFS can use RDF as syntax, i.e., RDFS statements are be themselves expressed as RDF triplets with some specificpropertiesandobjects

Declaration of classes and subclass relationships

hStaffrdf:typerdfs:Classi

hJavaCourserdfs:subClassOfCSCoursei

Declaration of instances

hDupondrdf:typeAcademicStaffi

(29)

RDF Schema - continued

Declaration of relations (properties in RDFS terminology)

hRegisteredTordf:typerdfs:Propertyi

Declaration of subproperty relationships

hLateRegisteredTordfs:subPropertyOfRegisteredToi

Declaration of domain and range for predicates (used for inference)

hTeachesInrdfs:domainAcademicStaffi

hTeachesInrdfs:rangeCoursei

TeachesIn(AcademicStaff, Course)

(30)

RDFS logical semantics

RDF and RDFS statements FOL translation DL notation

hirdf:typeCi C(i) i:CorC(i)

hi P ji P(i,j) i P jorP(i,j)

hCrdfs:subClassOfDi ∀X(C(X)D(X)) CvD hPrdfs:subPropertyOfRi ∀X∀Y(P(X,Y)R(X,Y)) PvR hPrdfs:domainCi ∀X∀Y(P(X,Y)C(X)) ∃PvC hPrdfs:rangeDi ∀X∀Y(P(X,Y)D(Y)) ∃PvD DL: Description logics, a specialized logical formalism

(31)

OWL: Web Ontology Language

OWL extends RDFS to expressricher constraints

Main OWL constructs

Disjointnessbetween classes

• Constraints offunctionalityandsymmetryon predicates

• Classunionandintersection

Inspired bydescription logics

Severalprofiles: OWL Full, OWL DL, OWL Lite, OWL 2 EL, OWL 2 QL, OWL 2 RL.

Different profiles include differentfeatures, and have different (in)tractability

(32)

Slide acknowledgements

Material reused from Pierre Senellart’s class

http://pierre.senellart.com/talks/romatre-20130923.pdf itself adapted fromWeb data management, S. Abiteboul, I. Manolescu, P. Rigaux, M.-C. Rousset, P. Senellart, Cambridge University Press, 2012. Also available at

http://webdam.inria.fr/Jorge/

Références

Documents relatifs

Firefox Gecko, and (work-in-progress) Servo, using Rust Safari WebKit engine. Chrome Blink (fork of Webkit, in

• Let’s Encrypt : automated check (ACME protocol) and signature of an HTTPS certificate. •

placeholder Help text indicated when the field is empty value Indicate the default value. required Must be filled in to submit the form type

• CSS3 makes it possible to position elements in two dimensions using CSS Grid. • Use display: grid on the element which will serve as

SVG+JS Javascript manipulation of vector graphics WebGL Javascript API for accelerated 3D using a GPU. WebVR Describe scenes for virtual reality headsets with a-scene WebRTC Voice

JSP Integrating Java and a Web server (e.g., Apache Tomcat) node.js Chrome’s JavaScript engine (V8) plus a Web server Python Web frameworks: Django , CherryPy, Flask. Ruby

DOM (Document Object Model) gives an API for the tree representation of HTML and XML documents (independent from the programming language). Schema languages Specify the structure

• YAML: extends JSON with several features, e.g., explicit types, user-defined types, anchors and references. • Some JSON parsers are permissive and allow,