PARIS: Probabilistic Alignement of Relationships, Instances, and Schema
Fabian Suchanek Serge Abiteboul Pierre Senellart
15 November 2011
Big-Data Analytics for the Temporal Web
2 / 82
Outline
Ontology Matching Probabilistic Model Experiments
Challenges for Large Evolving Ontologies Conclusion
RDF Ontologies
Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf
RDF Ontologies
Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf classes
relations / properties instances
literals
4 / 82
RDF Ontologies
instances
relations / properties classes
literals Singer
1935-01-08 plays
type
born Person
An RDF ontology can be seen as a graph of entities
subclassOf
Ontologies serve all kinds of purposes:
Existing Ontologies
There are literally hundreds of ontologies on the Web
1935-01-08 Singer
type
born plays
YAGO
Person subclassOf
6 / 82
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
8 / 82
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
complementary
Overlapping Data
birthDate 1935
type RockSinger
marriedTo
YAGO ElvisPedia
1935-01-08 born Singer
type
Many ontologies contain similar or overlapping entities and facts
plays
more specific
complementary equivalent
10 / 82
Problem: Elvis is lonely
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
Who is the spouse of the guitar player?
Complementary information cannot be used
?
marriedTo
Solution: Unify Entities
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
?
marriedTo
12 / 82
Goal: Merging Ontologies
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger plays
To merge two ontologies, we have to identify
Goal: Merging Ontologies
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
plays sameAs
14 / 82
Goal: Merging Ontologies
subClassOf
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
sameAs plays
Goal: Merging Ontologies
sameAs subClassOf
ElvisPedia birthDate type
Singer
marriedTo type
YAGO 1935-01-08
born
1935 RockSinger To merge two ontologies, we have to identify
• equivalent instances
• equivalent or subsuming classes
• equivalent or subsuming relations
subPropertyOf plays
16 / 82
Previous work
Previous work
• uses hard logical constraints, which may be inadequate
• requires parameter tuning
• has not been tried on large ontologies
birthDate marriedTo
1935 RockSinger type type
Singer
1935-01-08 born
Previous work
Previous work
• uses hard logical constraints, which may be inadequate
• requires parameter tuning
• has not been tried on large ontologies
birthDate marriedTo
1935 RockSinger type type
Singer
1935-01-08 born
(1)
• has mostly focused on (1) instance matching or (2) schema alignment, ... but not both
(2)
(1) (2)
18 / 82
PARIS: Aligning Everything At Once
1935-01-08 born There is a synergy between equality of instances, properties and classes!
=> Compute all together!
birthDate marriedTo
1935 type RockSinger type
Singer PARIS
20 / 82
Outline
Ontology Matching Probabilistic Model Experiments
Challenges for Large Evolving Ontologies Conclusion
Probabilistic Model
We chose a probabilistic model
the probability that x=y
the probability that is a sub-property of the probability that is a sub-class of
marriedTo
1935 type
1935-01-08
RockSinger
born
type
birthDate Singer
Literals
For literals, the probability of equality can be estimated upfront:
birthDate RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
22 / 82
Literals
For literals, the probability of equality can be estimated upfront:
birthDate RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
• for strings:
string distance
• for numbers:
numeric distance
• for other literals:
domain-specific
Literals
For literals, the probability of equality can be estimated upfront:
• for strings:
string distance
• for numbers:
numeric distance
• for other literals:
domain-specific birthDate
RockSinger type
1935 Singer
1935-01-08
type
born
marriedTo
?
We chose a particularly simple distance:
?
24 / 82Equality of Instances
For instances, relations give a hint:
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Equality of Instances
For instances, relations give a hint:
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Not many people are called Elvis =>
highly indicative
26 / 82
Equality of Instances
For instances, relations give a hint:
Not many people are called Elvis =>
highly indicative
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
Many people are born in 1935 =>
less indicative
Equality of Instances
Not many people are called Elvis =>
highly indicative
Many people are born in 1935 =>
less indicative
Elvis Presley label
label
born 1935
born
1935 Elvis Presley
marriedTo
?
one over the number of incoming links:
The local inverse functionality of a relation with an argument is
| |
28 / 82
Local Inverse Functionality
Elvis Presley
1935
marriedTo label
1935
born label
born
Elvis Presley Only one person
is called Elvis
The local inverse functionality of a relation with an argument is one over the number of incoming links:
?
| |
Local Inverse Functionality
Elvis Presley
1935
marriedTo label
1935
born label
born
Elvis Presley Only one person
is called Elvis
The local inverse functionality of a relation with an argument is one over the number of incoming links:
?
10 people are born in 1935
| |
30 / 82
Global Inverse Functionality
Elvis Presley Elvis Presley marriedTo label
born
1935 1935
label
born
?
the harmonic mean of the local inverse functionalities:
The global inverse functionality of a relation is
Many people share their year of birth Most people have distinct names
Equality of Instances
marriedTo Elvis Presley Elvis Presley
label label
born born
1935 1935
?
x and x' shall be matched if
Most people have distinct names
Many people share their year of birth
32 / 82
Equality of Instances
x and x' shall be matched if
Equality of Instances
x and x' shall be matched if
34 / 82
Equality of Instances
x and x' shall be matched if
Let's compute the probability of this happening,
Equality of Instances
x and x' shall be matched if
Let's compute the probability of this happening,
(adventurous independence assumptions go here)
36 / 82
Equality of Instances
x and x' shall be matched if
(adventurous independence assumptions go here) Let's compute the probability of this happening,
Equality of Instances
label marriedTo
1935 1935
born born
label
Elvis Presley Elvis Presley
?
This evaluates to 1 iff
• There is at least one highly inverse functional r
• There is one shared argument y=y'
38 / 82
Equality of Instances
• Literals: precomputed
• Others: recursive
label marriedTo
1935 1935
born born
label
Elvis Presley Elvis Presley
?
Unequality of Instances
1935 1970
born born
marriedTo How do we guard against inequalities?
?
Elvis Presley label
Elvis Presley label
40 / 82
Unequality of Instances
1935 1970
born born
marriedTo How do we guard against inequalities?
?
Elvis Presley label
Elvis Presley label
These cannot be equal, because they have a different value
for a functional relation.
Unequality of Instances
1935 1970
born born
marriedTo How do we guard against inequalities?
?
Elvis Presley label
Elvis Presley label
These cannot be equal, because they have a different value
for a functional relation.
The functionality is defined analogously to the inverse functionality.
42 / 82
Unequality of Instances
1935 1970
born born
marriedTo
Every value in one ontology should have a pendant in the other, weighted by the functionality:
?
This factor makes sure that for every y, there is one equivalent y'
Unequality of Instances
1935 1970
born born
marriedTo
Every value in one ontology should have a pendant in the other, weighted by the functionality:
?
This factor makes sure that for every y, there is one
equivalent y' sang
Lavender Blue Twinkle Twinkle
sang All shook up
Jailhouse Rock
44 / 82
Equality of Classes
type singer type
Rock-singer
If all instances of one class are instances of the other then the former subsumes the latter
Equality of Classes
type singer type
Rock-singer
If all instances of one class are instances of the other then the former subsumes the latter
subclassOf
46 / 82
Equality of Classes
type singer type
Rock-singer subclassOf
If all instances of one class are instances of the other then the former subsumes the latter
| |
| |
Equality of Classes
type singer type
Rock-singer subclassOf
If all instances of one class are instances of the other then the former subsumes the latter
| |
| | | |
48 / 82
Equality of Classes
type singer type
Rock-singer subclassOf
If all instances of one class are instances of the other then the former subsumes the latter
| |
| | | |
| |
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
50 / 82
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
marriedTosubpropertyOf knows
Equality of Relations
knows
marriedTo knows
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
marriedTo
knows subpropertyOf marriedTo
52 / 82
Equality of Relations
knows
marriedTo knows
marriedTo
If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.
(Those in the intersection)
(Those that have a pendant) knows
subpropertyOf marriedTo
Algorithm
1. Fix equalities for literals
Literals:
54 / 82
Algorithm
Literals:
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
Relations:
Algorithm
Instances:
Relations:
Literals:
1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*)
(*) We have no proof for convergence, but it seems to happen
Iterate
56 / 82
Algorithm
Instances:
Relations:
Literals:
Iterate 1. Fix equalities for literals
2. Set equalities for relations to a small initial value
3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes
final Classes: step
Example
Elvis
dreamsOf label
Elvis
name Elvis Ontology 1 Ontology 2 Ontology 2
58 / 82
Example
Elvis
dreamsOf label
Elvis
name Elvis Ontology 1 Ontology 2 Ontology 2
label name
Ontology 1 Ontology 2
Example
Elvis
dreamsOf label
Elvis
name Elvis
label Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2 Ciccone
1958 Frozen
60 / 82
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen
label name
Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
62 / 82
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen
label name
Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
Example
Elvis
dreamsOf label
Elvis
name Elvis
Ciccone 1958 Frozen label
Madonna
name Madonna Ontology 1 Ontology 2 Ontology 2
Ontology 1 Ontology 2
64 / 82
Outline
Ontology Matching Probabilistic Model Experiments
Challenges for Large Evolving Ontologies Conclusion
Experiment: YAGO and DBpedia
Singer
SuperHuman
1977
1935 1935
died born
gender male
born with stillborn twin brother
DBpedia YAGO
1.7m instances 19m facts
2.6m instances 18m facts
66 / 82
Experiment: YAGO and DBpedia
Singer
SuperHuman
1977
1935 1935
died born
gender male
born with stillborn twin brother
DBpedia YAGO
1.7m instances 19m facts
2.6m instances 18m facts Independent
class hierarchies
Experiment: YAGO and DBpedia
Singer
SuperHuman
1977
1935 1935
died born
gender male Independent
class hierarchies
born with stillborn twin brother
DBpedia YAGO
1.7m instances 19m facts
2.6m instances 18m facts 50% overlap
of instances
68 / 82
Experiment: YAGO and DBpedia
Singer
SuperHuman
1977
1935 1935
died born
gender male Independent
class hierarchies
50% overlap of instances
born with stillborn twin brother
DBpedia YAGO
1.7m instances 19m facts
2.6m instances 18m facts Independent
properties
Experiment: YAGO and DBpedia
Singer
SuperHuman
1977
1935 1935
died born
gender male Independent
class hierarchies
50% overlap of instances Independent properties
born with stillborn twin brother
DBpedia YAGO
1.7m instances 19m facts
2.6m instances 18m facts Many
complementary facts
70 / 82
Experiment: YAGO and DBpedia
Iteration 1 2 3 4
Change -- 12%
1%
0.3%
Time 4:04h 5:06h 5:00h 5:30h
Precision 86%
89%
90%
90%
Recall 69%
73%
73%
73%
Matching the instances:
Experiment: YAGO and DBpedia
Iteration 1 2 3 4
Change -- 12%
1%
0.3%
Time 4:04h 5:06h 5:00h 5:30h
Precision 86%
89%
90%
90%
Recall 69%
73%
73%
73%
Matching the instances:
Precision: 74%
Aligns roughly half of DBpedia's classes Matching the classes:
72 / 82
Experiment: YAGO and DBpedia
yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%
Matching the relations:
Experiment: YAGO and DBpedia
yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%
Matching the relations:
+ more experiments on OAEI standard datasets and IMDb
74 / 82
Experiment: YAGO and DBpedia
+ more experiments on OAEI standard datasets and IMDb yago:actedIn = dbpedia:starring-
yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%
Matching the relations:
All experiments run with the same settings.
No parameter tuning.
76 / 82
Outline
Ontology Matching Probabilistic Model Experiments
Challenges for Large Evolving Ontologies Conclusion
Execution Time
I Essentiallyquadraticalgorithm
∀x1∈O1∀x2∈O2 Pr(x≡x0) =. . .
I Optimizations make it possible to only compare entities that have something in common, but only because of our trivial literal matching mechanism
I Lots of random accesses: using aSSDrather than a magnetic hard drive gives us a 25×speedup!
I Still,at the limitof what is processable by a centralized, naïve implementation of our approach
I What if we had 10 million instances? 1 billion instances?
78 / 82
How to Improve Efficiency
I Parallelization (à la MapReduce) isvery much possible:
I In one iteration, the new value of Pr(x≡y)does not depend on the new value of Pr(x0 ≡y0)forx0 6=y0.
I (Costly)synchronizationneeded at the end of each iteration (similar to PageRank computation)
I Does not solve everything, though: does not change the fact that we deal with anessentially quadratic
computation!
I Clever indexingdefinitely needed, but not trivial, since distance between literals might not be a nice metric
I Samplingprobably the way to go: a human does not need to compare all pairs to determine that two relations are the same.
I What abouton demand matchingof pairs, based on local neighborhoods?
Evolving Ontologies
I Ontologies arenot static, new facts get added all the time (and some facts get deleted):
I incrementally (e.g., IMDb);
I each time the dataset is regenerated (e.g., YAGO).
I Raises two challenges:
I Can matching be performedincrementallyas well?
I How to deal withtime-dependent equalities(e.g.,Barack ObamaandPresident of the United States) and
time-dependent functional dependencies?
80 / 82
Outline
Ontology Matching Probabilistic Model Experiments
Challenges for Large Evolving Ontologies Conclusion
Conclusion
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
marriedTo
Conclusion
• PARIS matches relations, instances and schema holistically
• PARIS has no parameters to tune
• PARIS shows high precision and recall
• PARIS allows the YAGO Elvis to marry the DBpedia Priscilla
marriedTo equals
82 / 82