• Aucun résultat trouvé

PARIS: Probabilistic Alignment of Relationships, Instances, and Schema

N/A
N/A
Protected

Academic year: 2022

Partager "PARIS: Probabilistic Alignment of Relationships, Instances, and Schema"

Copied!
59
0
0

Texte intégral

(1)

PARIS: Probabilistic Alignment of Relationships, Instances, and Schema

Pierre Senellart (joint work with Serge Abiteboul, Antoine Amarilli, Fabian Suchanek)

7 July 2013

Data Extraction and Object Search

(2)

Outline

Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion

(3)

RDF Ontologies

Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf

(4)

RDF Ontologies

Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf classes

relations / properties instances

literals

(5)

RDF Ontologies

instances

relations / properties classes

literals Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf

intelligent search, QA, machine translation...

Ontologies serve all kinds of purposes:

(6)

Existing Ontologies

There are literally hundreds of ontologies on the Web

1935-01-08 Singer

type

born plays

YAGO

Person subclassOf

(7)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

(8)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

(9)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

complementary

(10)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

complementary equivalent

(11)

Problem: Elvis is lonely

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

Who is the spouse of the guitar player?

Complementary information cannot be used

?

marriedTo

(12)

Solution: Unify Entities

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

?

marriedTo

(13)

Goal: Merging Ontologies

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

To merge two ontologies, we have to identify

(14)

Goal: Merging Ontologies

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

plays sameAs

(15)

Goal: Merging Ontologies

subClassOf

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

• equivalent or subsuming classes

sameAs plays

(16)

Goal: Merging Ontologies

sameAs subClassOf

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

• equivalent or subsuming classes

• equivalent or subsuming relations

subPropertyOf plays

(17)

Outline

Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion

(18)

Probabilistic Model

We chose a probabilistic model

the probability that x=y

the probability that is a sub-property of the probability that is a sub-class of

marriedTo

1935 type

1935-01-08

RockSinger

born

type

birthDate Singer

(19)

Literals

For literals, the probability of equality can be estimated upfront:

birthDate RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

(20)

Literals

For literals, the probability of equality can be estimated upfront:

birthDate RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

• for strings:

string distance

• for numbers:

numeric distance

• for other literals:

domain-specific

(21)

Literals

For literals, the probability of equality can be estimated upfront:

• for strings:

string distance

• for numbers:

numeric distance

• for other literals:

domain-specific birthDate

RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

We chose a particularly simple distance:

?

(22)

Equality of Instances

For instances, relations give a hint:

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

(23)

Equality of Instances

For instances, relations give a hint:

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

Not many people are called Elvis =>

highly indicative

(24)

Equality of Instances

For instances, relations give a hint:

Not many people are called Elvis =>

highly indicative

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

Many people are born in 1935 =>

less indicative

(25)

Equality of Instances

Not many people are called Elvis =>

highly indicative

Many people are born in 1935 =>

less indicative

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

one over the number of incoming links:

The local inverse functionality of a relation with an argument is

| |

(26)

Local Inverse Functionality

Elvis Presley

1935

marriedTo label

1935

born label

born

Elvis Presley Only one person

is called Elvis

The local inverse functionality of a relation with an argument is one over the number of incoming links:

?

| |

(27)

Local Inverse Functionality

Elvis Presley

1935

marriedTo label

1935

born label

born

Elvis Presley Only one person

is called Elvis

The local inverse functionality of a relation with an argument is one over the number of incoming links:

?

10 people are born in 1935

| |

(28)

Global Inverse Functionality

Elvis Presley Elvis Presley marriedTo label

born

1935 1935

label

born

?

the harmonic mean of the local inverse functionalities:

The global inverse functionality of a relation is

Many people share their year of birth Most people have distinct names

(29)

Equality of Instances

marriedTo Elvis Presley Elvis Presley

label label

born born

1935 1935

?

x and x' shall be matched if

Most people have distinct names

Many people share their year of birth

(30)

Equality of Instances

x and x' shall be matched if

(31)

Equality of Instances

x and x' shall be matched if

(32)

Equality of Instances

x and x' shall be matched if

Let's compute the probability of this happening,

(33)

Equality of Instances

x and x' shall be matched if

Let's compute the probability of this happening,

(adventurous independence assumptions go here)

(34)

Equality of Instances

x and x' shall be matched if

(adventurous independence assumptions go here) Let's compute the probability of this happening,

(35)

Equality of Instances

label marriedTo

1935 1935

born born

label

Elvis Presley Elvis Presley

?

This evaluates to 1 iff

• There is at least one highly inverse functional r

• There is one shared argument y=y'

(36)

Equality of Instances

• Literals: precomputed

• Others: recursive

label marriedTo

1935 1935

born born

label

Elvis Presley Elvis Presley

?

(37)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

(38)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

marriedTosubpropertyOf knows

(39)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

marriedTo

knows subpropertyOf marriedTo

(40)

Equality of Relations

knows

marriedTo knows

marriedTo

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

(Those in the intersection)

(Those that have a pendant) knows

subpropertyOf marriedTo

(41)

Algorithm

1. Fix equalities for literals

Literals:

(42)

Algorithm

Literals:

1. Fix equalities for literals

2. Set equalities for relations to a small initial value

Relations:

(43)

Algorithm

Instances:

Relations:

Literals:

1. Fix equalities for literals

2. Set equalities for relations to a small initial value

3. Iterate the estimations for relations and instances to convergence (*) Iterate

(44)

Algorithm

Instances:

Relations:

Literals:

Iterate 1. Fix equalities for literals

2. Set equalities for relations to a small initial value

3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes

final Classes: step

(45)

Example

Elvis

dreamsOf label

Elvis

name Elvis Ontology 1 Ontology 2 Ontology 2

(46)

Example

Elvis

dreamsOf label

Elvis

name Elvis Ontology 1 Ontology 2 Ontology 2

Madonna label

Madonna name Ontology 1 Ontology 2

46 / 58

(47)

Example

Elvis

dreamsOf label

Elvis

name Elvis

label Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2 Ciccone

1958 Frozen

47 / 58

(48)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

48 / 58

(49)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

49 / 58

(50)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

50 / 58

(51)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

51 / 58

(52)

Outline

Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion

(53)

Join Relations

a:Douglas_Adams a:countryOfBirth a:UK

b:Douglas_Adams

b:Cambridge b:birthPlace

(b:birthPlace, b:country) b:UK

b:country

I The simplest possible difference in structure between ontologies: relations of one ontology correspond tojoin relationsin the other ontology.

I The terminology is motivated by the “join” operator of relational algebra.

I We see the join as abinary predicate: the intermediate nodes are existentially quantified but projected away.

(54)

Support in PARIS

I We must keep the representation of joinsimplicitin PARIS (memory constraints).

I We mustrecursively enumerateall possible join facts instead of enumerating all possible facts.

I We must avoidduplicate factscaused by multiple possible choices for the intermediate nodes.

I We cannot afford to enumerateall possible relations anymore (many possible joins).

⇒ New algorithmto compute the entity and relation alignments simultaneously.

(55)

Practical Issues

I How to determine thefunctionalityof join relations?

I How to selectinteresting joinsto perform without exploring all joins?

I How to achieve acceptablerunning timeon large ontologies?

⇒ We only perform the join alignment onsmallontologies.

(56)

Outline

Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion

(57)

Conclusion

• PARIS matches relations, instances and schema holistically

• PARIS has no parameters to tune

• PARIS shows high precision and recall

marriedTo

(58)

Conclusion

• PARIS matches relations, instances and schema holistically

• PARIS has no parameters to tune

• PARIS shows high precision and recall

• PARIS allows the YAGO Elvis to marry the DBpedia Priscilla

marriedTo equals

(59)

Merci.

Références

Documents relatifs

The concept behind the RAPOD is to establish intellectual control of the probabilistic ontology (PO) model, stimulate reuse, and provide a basis for development through

We then proceed to showing that query rewriting is a complete tool for proving PTime data complexity in pOBDA, in the fol- lowing sense: we replace DL-Lite with the strictly

This paper extends a hybrid evolutionary approach, which applies a local optimization method, by taking instances into ac- count in order to reduce premature convergence

Unfortunately, using Type 2 semantics to interpret different kinds of probabili- ties complicates not only the representation of statistics but also the combination of

For each class, property, domain or range element in the RDF Schema XML, the XSLT builds a <span> element in the output HTML.. The span contains all the necessary attributes

The lattice-based matching method is applied to evaluate the spatiotempo- ral ontology and its corresponding resources; for a particular spatiotemporal resource, we align the

This quadruple specifies that an instance individual that has value val on property prop will matches the model individ- ual ind with probability prop.. Example The semantic

We applied semantic category matching (SCM) technology to the ontology alignment problem. Our method found pairs of semantically corresponding categories between two