PARIS: Probabilistic Alignment of Relationships, Instances, and Schema
Pierre Senellart (joint work with Serge Abiteboul, Antoine Amarilli, Fabian Suchanek)
7 July 2013
Data Extraction and Object Search
Outline
Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion
RDF Ontologies
Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf
RDF Ontologies
Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf classes
relations / properties instances
literals
RDF Ontologies
instances
relations / properties classes
literals Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf
intelligent search, QA, machine translation...
Ontologies serve all kinds of purposes:
Existing Ontologies
There are literally hundreds of ontologies on the Web
1935-01-08 Singer
type
born plays
YAGO
Person subclassOf
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
complementary
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
complementary equivalent
Problem: Elvis is lonely
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
Who is the spouse of the guitar player?
Complementary information cannot be used
?
marriedTo
Solution: Unify Entities
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
?
marriedTo
Goal: Merging Ontologies
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
To merge two ontologies, we have to identify
Goal: Merging Ontologies
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
plays sameAs
Goal: Merging Ontologies
subClassOf
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
sameAs plays
Goal: Merging Ontologies
sameAs subClassOf
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
• equivalent or subsuming relations
subPropertyOf plays
Outline
Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion
Probabilistic Model
We chose a probabilistic model
the probability that x=y
the probability that is a sub-property of the probability that is a sub-class of
marriedTo
1935 type
1935-01-08
RockSinger
born
type
birthDate Singer
Literals
For literals, the probability of equality can be estimated upfront:
birthDate RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
Literals
For literals, the probability of equality can be estimated upfront:
birthDate RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
• for strings:
string distance
• for numbers:
numeric distance
• for other literals:
domain-specific
Literals
For literals, the probability of equality can be estimated upfront:
• for strings:
string distance
• for numbers:
numeric distance
• for other literals:
domain-specific birthDate
RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
We chose a particularly simple distance:
?
Equality of Instances
For instances, relations give a hint:
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Equality of Instances
For instances, relations give a hint:
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Not many people are called Elvis =>
highly indicative
Equality of Instances
For instances, relations give a hint:
Not many people are called Elvis =>
highly indicative
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Many people are born in 1935 =>
less indicative
Equality of Instances
Not many people are called Elvis =>
highly indicative
Many people are born in 1935 =>
less indicative
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
one over the number of incoming links:
The local inverse functionality of a relation with an argument is
| |
Local Inverse Functionality
Elvis Presley
1935
marriedTo label
1935
born label
born
Elvis Presley Only one person
is called Elvis
The local inverse functionality of a relation with an argument is one over the number of incoming links:
?
| |
Local Inverse Functionality
Elvis Presley
1935
marriedTo label
1935
born label
born
Elvis Presley Only one person
is called Elvis
The local inverse functionality of a relation with an argument is one over the number of incoming links:
?
10 people are born in 1935
| |
Global Inverse Functionality
Elvis Presley Elvis Presley marriedTo label
born
1935 1935
label
born
?
the harmonic mean of the local inverse functionalities:
The global inverse functionality of a relation is
Many people share their year of birth Most people have distinct names
Equality of Instances
marriedTo Elvis Presley Elvis Presley
label label
born born
1935 1935
?
x and x' shall be matched if
Most people have distinct names
Many people share their year of birth
Equality of Instances
x and x' shall be matched if
Equality of Instances
x and x' shall be matched if
Equality of Instances
x and x' shall be matched if
Let's compute the probability of this happening,
Equality of Instances
x and x' shall be matched if
Let's compute the probability of this happening,
(adventurous independence assumptions go here)
Equality of Instances
x and x' shall be matched if
(adventurous independence assumptions go here) Let's compute the probability of this happening,
Equality of Instances
label marriedTo
1935 1935
born born
label
Elvis Presley Elvis Presley
?
This evaluates to 1 iff
• There is at least one highly inverse functional r
• There is one shared argument y=y'
Equality of Instances
• Literals: precomputed
• Others: recursive
label marriedTo
1935 1935
born born
label
Elvis Presley Elvis Presley
?
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
marriedTosubpropertyOf knows
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
marriedTo
knows subpropertyOf marriedTo
Equality of Relations
knows
marriedTo knows
marriedTo
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
(Those in the intersection)
(Those that have a pendant) knows
subpropertyOf marriedTo
Algorithm
1. Fix equalities for literals
Literals:
Algorithm
Literals:
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
Relations:
Algorithm
Instances:
Relations:
Literals:
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*) Iterate
Algorithm
Instances:
Relations:
Literals:
Iterate 1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes
final Classes: step
Example
Elvis
dreamsOf label
Elvis
name Elvis Ontology 1 Ontology 2 Ontology 2
Example
Elvis
dreamsOf label
Elvis
name Elvis Ontology 1 Ontology 2 Ontology 2
Madonna label
Madonna name Ontology 1 Ontology 2
46 / 58
Example
Elvis
dreamsOf label
Elvis
name Elvis
label Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2 Ciccone
1958 Frozen
47 / 58
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
48 / 58
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
49 / 58
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
50 / 58
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
51 / 58
Outline
Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion
Join Relations
a:Douglas_Adams a:countryOfBirth a:UK
b:Douglas_Adams
b:Cambridge b:birthPlace
(b:birthPlace, b:country) b:UK
b:country
I The simplest possible difference in structure between ontologies: relations of one ontology correspond tojoin relationsin the other ontology.
I The terminology is motivated by the “join” operator of relational algebra.
I We see the join as abinary predicate: the intermediate nodes are existentially quantified but projected away.
Support in PARIS
I We must keep the representation of joinsimplicitin PARIS (memory constraints).
I We mustrecursively enumerateall possible join facts instead of enumerating all possible facts.
I We must avoidduplicate factscaused by multiple possible choices for the intermediate nodes.
I We cannot afford to enumerateall possible relations anymore (many possible joins).
⇒ New algorithmto compute the entity and relation alignments simultaneously.
Practical Issues
I How to determine thefunctionalityof join relations?
I How to selectinteresting joinsto perform without exploring all joins?
I How to achieve acceptablerunning timeon large ontologies?
⇒ We only perform the join alignment onsmallontologies.
Outline
Ontology Matching Probabilistic Model Beyond PARIS: Joins Conclusion
Conclusion
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
marriedTo
Conclusion
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
• PARIS allows the YAGO Elvis to marry the DBpedia Priscilla
marriedTo equals