• Aucun résultat trouvé

PARIS: Probabilistic Alignement of Relationships, Instances, and Schema

N/A
N/A
Protected

Academic year: 2022

Partager "PARIS: Probabilistic Alignement of Relationships, Instances, and Schema"

Copied!
83
0
0

Texte intégral

(1)

PARIS: Probabilistic Alignement of Relationships, Instances, and Schema

Fabian Suchanek Serge Abiteboul Pierre Senellart

15 November 2011

Big-Data Analytics for the Temporal Web

(2)

2 / 82

Outline

Ontology Matching Probabilistic Model Experiments

Challenges for Large Evolving Ontologies Conclusion

(3)

RDF Ontologies

Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf

(4)

RDF Ontologies

Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf classes

relations / properties instances

literals

4 / 82

(5)

RDF Ontologies

instances

relations / properties classes

literals Singer

1935-01-08 plays

type

born Person

An RDF ontology can be seen as a graph of entities

subclassOf

Ontologies serve all kinds of purposes:

(6)

Existing Ontologies

There are literally hundreds of ontologies on the Web

1935-01-08 Singer

type

born plays

YAGO

Person subclassOf

6 / 82

(7)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

(8)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

8 / 82

(9)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

complementary

(10)

Overlapping Data

birthDate 1935

type RockSinger

marriedTo

YAGO ElvisPedia

1935-01-08 born Singer

type

Many ontologies contain similar or overlapping entities and facts

plays

more specific

complementary equivalent

10 / 82

(11)

Problem: Elvis is lonely

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

Who is the spouse of the guitar player?

Complementary information cannot be used

?

marriedTo

(12)

Solution: Unify Entities

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

?

marriedTo

12 / 82

(13)

Goal: Merging Ontologies

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger plays

To merge two ontologies, we have to identify

(14)

Goal: Merging Ontologies

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

plays sameAs

14 / 82

(15)

Goal: Merging Ontologies

subClassOf

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

• equivalent or subsuming classes

sameAs plays

(16)

Goal: Merging Ontologies

sameAs subClassOf

ElvisPedia birthDate type

Singer

marriedTo type

YAGO 1935-01-08

born

1935 RockSinger To merge two ontologies, we have to identify

• equivalent instances

• equivalent or subsuming classes

• equivalent or subsuming relations

subPropertyOf plays

16 / 82

(17)

Previous work

Previous work

• uses hard logical constraints, which may be inadequate

• requires parameter tuning

• has not been tried on large ontologies

birthDate marriedTo

1935 RockSinger type type

Singer

1935-01-08 born

(18)

Previous work

Previous work

• uses hard logical constraints, which may be inadequate

• requires parameter tuning

• has not been tried on large ontologies

birthDate marriedTo

1935 RockSinger type type

Singer

1935-01-08 born

(1)

• has mostly focused on (1) instance matching or (2) schema alignment, ... but not both

(2)

(1) (2)

18 / 82

(19)

PARIS: Aligning Everything At Once

1935-01-08 born There is a synergy between equality of instances, properties and classes!

=> Compute all together!

birthDate marriedTo

1935 type RockSinger type

Singer PARIS

(20)

20 / 82

Outline

Ontology Matching Probabilistic Model Experiments

Challenges for Large Evolving Ontologies Conclusion

(21)

Probabilistic Model

We chose a probabilistic model

the probability that x=y

the probability that is a sub-property of the probability that is a sub-class of

marriedTo

1935 type

1935-01-08

RockSinger

born

type

birthDate Singer

(22)

Literals

For literals, the probability of equality can be estimated upfront:

birthDate RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

22 / 82

(23)

Literals

For literals, the probability of equality can be estimated upfront:

birthDate RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

• for strings:

string distance

• for numbers:

numeric distance

• for other literals:

domain-specific

(24)

Literals

For literals, the probability of equality can be estimated upfront:

• for strings:

string distance

• for numbers:

numeric distance

• for other literals:

domain-specific birthDate

RockSinger type

1935 Singer

1935-01-08

type

born

marriedTo

?

We chose a particularly simple distance:

?

24 / 82

(25)

Equality of Instances

For instances, relations give a hint:

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

(26)

Equality of Instances

For instances, relations give a hint:

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

Not many people are called Elvis =>

highly indicative

26 / 82

(27)

Equality of Instances

For instances, relations give a hint:

Not many people are called Elvis =>

highly indicative

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

Many people are born in 1935 =>

less indicative

(28)

Equality of Instances

Not many people are called Elvis =>

highly indicative

Many people are born in 1935 =>

less indicative

Elvis Presley label

label

born 1935

born

1935 Elvis Presley

marriedTo

?

one over the number of incoming links:

The local inverse functionality of a relation with an argument is

| |

28 / 82

(29)

Local Inverse Functionality

Elvis Presley

1935

marriedTo label

1935

born label

born

Elvis Presley Only one person

is called Elvis

The local inverse functionality of a relation with an argument is one over the number of incoming links:

?

| |

(30)

Local Inverse Functionality

Elvis Presley

1935

marriedTo label

1935

born label

born

Elvis Presley Only one person

is called Elvis

The local inverse functionality of a relation with an argument is one over the number of incoming links:

?

10 people are born in 1935

| |

30 / 82

(31)

Global Inverse Functionality

Elvis Presley Elvis Presley marriedTo label

born

1935 1935

label

born

?

the harmonic mean of the local inverse functionalities:

The global inverse functionality of a relation is

Many people share their year of birth Most people have distinct names

(32)

Equality of Instances

marriedTo Elvis Presley Elvis Presley

label label

born born

1935 1935

?

x and x' shall be matched if

Most people have distinct names

Many people share their year of birth

32 / 82

(33)

Equality of Instances

x and x' shall be matched if

(34)

Equality of Instances

x and x' shall be matched if

34 / 82

(35)

Equality of Instances

x and x' shall be matched if

Let's compute the probability of this happening,

(36)

Equality of Instances

x and x' shall be matched if

Let's compute the probability of this happening,

(adventurous independence assumptions go here)

36 / 82

(37)

Equality of Instances

x and x' shall be matched if

(adventurous independence assumptions go here) Let's compute the probability of this happening,

(38)

Equality of Instances

label marriedTo

1935 1935

born born

label

Elvis Presley Elvis Presley

?

This evaluates to 1 iff

• There is at least one highly inverse functional r

• There is one shared argument y=y'

38 / 82

(39)

Equality of Instances

• Literals: precomputed

• Others: recursive

label marriedTo

1935 1935

born born

label

Elvis Presley Elvis Presley

?

(40)

Unequality of Instances

1935 1970

born born

marriedTo How do we guard against inequalities?

?

Elvis Presley label

Elvis Presley label

40 / 82

(41)

Unequality of Instances

1935 1970

born born

marriedTo How do we guard against inequalities?

?

Elvis Presley label

Elvis Presley label

These cannot be equal, because they have a different value

for a functional relation.

(42)

Unequality of Instances

1935 1970

born born

marriedTo How do we guard against inequalities?

?

Elvis Presley label

Elvis Presley label

These cannot be equal, because they have a different value

for a functional relation.

The functionality is defined analogously to the inverse functionality.

42 / 82

(43)

Unequality of Instances

1935 1970

born born

marriedTo

Every value in one ontology should have a pendant in the other, weighted by the functionality:

?

This factor makes sure that for every y, there is one equivalent y'

(44)

Unequality of Instances

1935 1970

born born

marriedTo

Every value in one ontology should have a pendant in the other, weighted by the functionality:

?

This factor makes sure that for every y, there is one

equivalent y' sang

Lavender Blue Twinkle Twinkle

sang All shook up

Jailhouse Rock

44 / 82

(45)

Equality of Classes

type singer type

Rock-singer

If all instances of one class are instances of the other then the former subsumes the latter

(46)

Equality of Classes

type singer type

Rock-singer

If all instances of one class are instances of the other then the former subsumes the latter

subclassOf

46 / 82

(47)

Equality of Classes

type singer type

Rock-singer subclassOf

If all instances of one class are instances of the other then the former subsumes the latter

| |

| |

(48)

Equality of Classes

type singer type

Rock-singer subclassOf

If all instances of one class are instances of the other then the former subsumes the latter

| |

| | | |

48 / 82

(49)

Equality of Classes

type singer type

Rock-singer subclassOf

If all instances of one class are instances of the other then the former subsumes the latter

| |

| | | |

| |

(50)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

50 / 82

(51)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

marriedTosubpropertyOf knows

(52)

Equality of Relations

knows

marriedTo knows

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

marriedTo

knows subpropertyOf marriedTo

52 / 82

(53)

Equality of Relations

knows

marriedTo knows

marriedTo

If every pair of one relation is a pair of another relation, then the first is a sub-property of the second.

(Those in the intersection)

(Those that have a pendant) knows

subpropertyOf marriedTo

(54)

Algorithm

1. Fix equalities for literals

Literals:

54 / 82

(55)

Algorithm

Literals:

1. Fix equalities for literals

2. Set equalities for relations to a small initial value

Relations:

(56)

Algorithm

Instances:

Relations:

Literals:

1. Fix equalities for literals

2. Set equalities for relations to a small initial value

3. Iterate the estimations for relations and instances to convergence (*)

(*) We have no proof for convergence, but it seems to happen

Iterate

56 / 82

(57)

Algorithm

Instances:

Relations:

Literals:

Iterate 1. Fix equalities for literals

2. Set equalities for relations to a small initial value

3. Iterate the estimations for relations and instances to convergence (*) 4. Compute the estimations for classes

final Classes: step

(58)

Example

Elvis

dreamsOf label

Elvis

name Elvis Ontology 1 Ontology 2 Ontology 2

58 / 82

(59)

Example

Elvis

dreamsOf label

Elvis

name Elvis Ontology 1 Ontology 2 Ontology 2

label name

Ontology 1 Ontology 2

(60)

Example

Elvis

dreamsOf label

Elvis

name Elvis

label Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2 Ciccone

1958 Frozen

60 / 82

(61)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen

label name

Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

(62)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

62 / 82

(63)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen

label name

Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

(64)

Example

Elvis

dreamsOf label

Elvis

name Elvis

Ciccone 1958 Frozen label

Madonna

name Madonna Ontology 1 Ontology 2 Ontology 2

Ontology 1 Ontology 2

64 / 82

(65)

Outline

Ontology Matching Probabilistic Model Experiments

Challenges for Large Evolving Ontologies Conclusion

(66)

Experiment: YAGO and DBpedia

Singer

SuperHuman

1977

1935 1935

died born

gender male

born with stillborn twin brother

DBpedia YAGO

1.7m instances 19m facts

2.6m instances 18m facts

66 / 82

(67)

Experiment: YAGO and DBpedia

Singer

SuperHuman

1977

1935 1935

died born

gender male

born with stillborn twin brother

DBpedia YAGO

1.7m instances 19m facts

2.6m instances 18m facts Independent

class hierarchies

(68)

Experiment: YAGO and DBpedia

Singer

SuperHuman

1977

1935 1935

died born

gender male Independent

class hierarchies

born with stillborn twin brother

DBpedia YAGO

1.7m instances 19m facts

2.6m instances 18m facts 50% overlap

of instances

68 / 82

(69)

Experiment: YAGO and DBpedia

Singer

SuperHuman

1977

1935 1935

died born

gender male Independent

class hierarchies

50% overlap of instances

born with stillborn twin brother

DBpedia YAGO

1.7m instances 19m facts

2.6m instances 18m facts Independent

properties

(70)

Experiment: YAGO and DBpedia

Singer

SuperHuman

1977

1935 1935

died born

gender male Independent

class hierarchies

50% overlap of instances Independent properties

born with stillborn twin brother

DBpedia YAGO

1.7m instances 19m facts

2.6m instances 18m facts Many

complementary facts

70 / 82

(71)

Experiment: YAGO and DBpedia

Iteration 1 2 3 4

Change -- 12%

1%

0.3%

Time 4:04h 5:06h 5:00h 5:30h

Precision 86%

89%

90%

90%

Recall 69%

73%

73%

73%

Matching the instances:

(72)

Experiment: YAGO and DBpedia

Iteration 1 2 3 4

Change -- 12%

1%

0.3%

Time 4:04h 5:06h 5:00h 5:30h

Precision 86%

89%

90%

90%

Recall 69%

73%

73%

73%

Matching the instances:

Precision: 74%

Aligns roughly half of DBpedia's classes Matching the classes:

72 / 82

(73)

Experiment: YAGO and DBpedia

yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%

Matching the relations:

(74)

Experiment: YAGO and DBpedia

yago:actedIn = dbpedia:starring- yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%

Matching the relations:

+ more experiments on OAEI standard datasets and IMDb

74 / 82

(75)

Experiment: YAGO and DBpedia

+ more experiments on OAEI standard datasets and IMDb yago:actedIn = dbpedia:starring-

yago:hasChild = dbpedia:child yago:hasChild = dbpedia:parent- yago:created > dbpedia:writer yago:diedIn > dbpedia:placeOfBurial ... and many more. Precision: 96%

Matching the relations:

All experiments run with the same settings.

No parameter tuning.

(76)

76 / 82

Outline

Ontology Matching Probabilistic Model Experiments

Challenges for Large Evolving Ontologies Conclusion

(77)

Execution Time

I Essentiallyquadraticalgorithm

x1O1x2O2 Pr(xx0) =. . .

I Optimizations make it possible to only compare entities that have something in common, but only because of our trivial literal matching mechanism

I Lots of random accesses: using aSSDrather than a magnetic hard drive gives us a 25×speedup!

I Still,at the limitof what is processable by a centralized, naïve implementation of our approach

I What if we had 10 million instances? 1 billion instances?

(78)

78 / 82

How to Improve Efficiency

I Parallelization (à la MapReduce) isvery much possible:

I In one iteration, the new value of Pr(xy)does not depend on the new value of Pr(x0y0)forx0 6=y0.

I (Costly)synchronizationneeded at the end of each iteration (similar to PageRank computation)

I Does not solve everything, though: does not change the fact that we deal with anessentially quadratic

computation!

I Clever indexingdefinitely needed, but not trivial, since distance between literals might not be a nice metric

I Samplingprobably the way to go: a human does not need to compare all pairs to determine that two relations are the same.

I What abouton demand matchingof pairs, based on local neighborhoods?

(79)

Evolving Ontologies

I Ontologies arenot static, new facts get added all the time (and some facts get deleted):

I incrementally (e.g., IMDb);

I each time the dataset is regenerated (e.g., YAGO).

I Raises two challenges:

I Can matching be performedincrementallyas well?

I How to deal withtime-dependent equalities(e.g.,Barack ObamaandPresident of the United States) and

time-dependent functional dependencies?

(80)

80 / 82

Outline

Ontology Matching Probabilistic Model Experiments

Challenges for Large Evolving Ontologies Conclusion

(81)

Conclusion

• PARIS matches relations, instances and schema holistically

• PARIS has no parameters to tune

• PARIS shows high precision and recall

marriedTo

(82)

Conclusion

• PARIS matches relations, instances and schema holistically

• PARIS has no parameters to tune

• PARIS shows high precision and recall

• PARIS allows the YAGO Elvis to marry the DBpedia Priscilla

marriedTo equals

82 / 82

(83)

Merci.

Références

Documents relatifs

The input files were grouped depending on their average processing time by SAT-solvers, and compared on several factors, including average clause length and percentage of Horn

prépare en liaison avec les équipes pédagogiques : la partie pédagogique du projet d'établissement, en vue de son adoption par le Conseil d'établissement ; les

- En fin d’année scolaire N : elle dresse un bilan des formations auxquelles ont participé les personnels de l’établissement (selon un document annexé à la note de cadrage) ;

Dans le cadre du développement des usages numériques dans le domaine de l'éducation, il convient pour les établissements publics locaux d'enseignement de s'adapter aux

Dans le cadre du développement des usages numériques dans le domaine de l'éducation, il convient pour les établissements publics locaux d'enseignement de s'adapter aux

Même un examen sommaire de l’histoire de l’évaluation à grande échelle au Canada révèle que les différences entre les instances sont à la fois importantes et persistantes. Il

Proof idea: extract instances of a hard problem as topological minors using recent polynomial bounds [Chekuri and Chuzhoy, 2014]..

be further noted that, unlike all other approaches, paris did not even know that the relations and classes were identical, but discovered the class and relation equivalences by