• Aucun résultat trouvé

On ambiguity and query-specific ontology mapping

N/A
N/A
Protected

Academic year: 2022

Partager "On ambiguity and query-specific ontology mapping"

Copied!
2
0
0

Texte intégral

(1)

On Ambiguity and Query-Specific Ontology Mapping

Aibo Tian, Juan F. Sequeda, and Daniel P. Miranker

Department of Computer Science, The University of Texas at Austin Austin, Texas, USA

atian@utexas.edu, {jsequeda, miranker}@cs.utexas.edu

Abstract. In the course of developing an ontology-based data integration sys- tem (OBDI) that includes automatic integration of data sources, and thus, in- cludes algorithmic ontology mapping, we have made the following observations.

A mapping method may determine that an entity in one ontology maps with equal likelihood to two or more entities in the other ontology. The mapping and refor- mulation of certain queries is correct only if one pairing is chosen. The correct choice may be different for different queries. Finally, the query itself may lend additional semantics that correctly resolve the ambiguity.

These observations suggest a targeted ontology mapping problem,query-specific ontology mapping. In addition to the two ontologies, a query serves as a third argument to the mapping algorithm. Further, the mapping algorithm need not produce a complete mapping, but only a partial mapping sufficient to correctly reformulate the query. We detail a number of open issues on how this problem statement might be refined, and consider features of its evaluation.

Ambiguity in Ontology Mapping:Consider the idealized representation (Fig. 1) of a critical issue in the automatic integration of new data sources in an OBDI sys- tem. T and S respectively represent target and data source ontologies. Looking at the ontologies alone, there is insufficient information to determine if the class T:People should be mapped to S:Teacher or to S:Student. A third possibility is a one-to-many mapping entailing both. Given the SPARQL query (Fig. 1c), it becomes clear that the query should be reformulated usingonlythe mapping{T:People = S:Teacher}. A com- plementary query about students should be reformulated usingonlythe complementary mapping. Thus, any static chose of one mapping will yield reformulated queries that return incorrect results.

Formulations of Query-Specific Ontology Mapping:In our system we compute a similarity matrix between all entities in the two ontologies [3]. The details may be borrowed from any ontology mapping algorithm that includes this step [2]. Given a query on the target ontology, our system uses a joint probability model to identify a maximal scoring, partial mapping that covers the target ontology entities mentioned in the query or that are needed to reformulate the query. Thus, our solution can be characterized as one that takes three arguments, and produces a partial mapping specific to the query.

There are at least two other approaches that may be considered and that produce a complete mapping and thus retain more of the standard definition of ontology matching.

First is to consider complex mappings. For example, instead of choosing{T:People =

(2)

time

name Course

string date People teacher student title

(a) OntologyT

hasSchedule

name Course

string date

Student name Teacher

place Schedule date teachBy takeBy name

(b) OntologyS

P ref ix course : < T /Course >

P ref ix people : < T /P eople >

Select ?t W here {

?c course:time ?t .

?c course:teacher ?p .

?p people:name “Einstein00 .}

Note that since the predicate of a triple pattern is not allowed to be a variable in our definition, there exists only one query graph for each queryq. The query graph of the SPARQL query in Figure??is shown in Figure??.

2.3 Problem Definition

A ss-path correspondence records the mapping confidence between two ss-paths.

Definition 9 (SS-PATH CORRESPONDENCE).Given two graphsGand G0, a ss-path correspondence between two ss-pathspandp0(denoted byπp,p0) is a tuple< p, p0, cp>, such thatpGRAPH-SS-PATH-SETG,p0GRAPH-SS- PATH-SETG0, andcpis a confidence measure.

We saypπp,p0, andp0πp,p0. We also useαπp,p0to denote the confi- dence measure, which isαπp,p0=cp. In the above definition, we assume the correspondence measures equivalence.

Definition 10 (MATCH CANDIDATE).Given a query graphTq, a graph Gis called a match candidate in terms of a set of correspondencesTq,G, where Tq,G=p,p0:pGRAPH-SS-PATH-SETTq, p0GRAPH-SS-PATH-SETG}, if the following conditions are satisfied:

Gis a subgraph ofS;

SINKGSINKS;

for all ss-pathpGRAPH-SS-PATH-SETTq, there exists exact one ss-path correspondenceπp,p0Tq,G, wherep0GRAPH-SS-PATH-SETG; for all ss-pathp0GRAPH-SS-PATH-SETG, there exists exact one ss-path

correspondenceπp,p0Tq,G, wherepGRAPH-SS-PATH-SETTq; for all pair of ss-pathsp1, p2GRAPH-SS-PATH-SETTq, if SOURCEp1

=SOURCEp2, the two corresponded ss-pathsp01, p02GRAPH-SS-PATH- SETG,πp1,p01Tq,G,πp2,p02Tq,G, also share the same source, SOURCEp01

=SOURCEp02;

(c) SPARQL query Fig. 1.Example ontologies and SPARQL query.

S:Teacher}or{T:People = S:Student}, the mapping system can detect “Teacher is the People who teaches” (similar for Student). However, to the best of our knowledge, there is no automatic system that can detect this kind of complex mapping.

Another approach may consider an entire workload of queries, as a batch or as a con- tinual pay-as-you-go refinement. In other words, a complete mapping is determined, but the information in a set of queries is used to bias the choices made. As many applica- tions comprise a set of dynamic web pages, their query set is easily identified. Consider the example and a course selection application. Since students are often interested in who is teaching a class, (and their grading policy), and privacy laws disallow revealing their fellow student’s enrollment, the mapping{T:People = S:Teacher}would always be correct. Incremental, pay-as-you go, solutions could integrate crowd-sourcing.

The pedagogical example’s brevity shouldn’t be used to diminish the problem’s im- portance. Comparing to Clio’s1algorithms our system demonstrates favorable results [1, 3]. Inspection of individual results suggests that resolving ambiguity is the primary source of improvement, and can be significant. However measuring the quality of the solutions, as a whole, and quantifying the frequency of ambiguity poses its own set of problems. Gold standard baselines must include queries and correct mappings. OAEI benchmarks cannot be used directly. Correct query reformulation may not require a unique mapping. Entity level ambiguity may not manifest wrt query reformulation, making it hard to identify through manual curation. To date, we have created three such test cases2. The test suite accommodates the unique mapping problem by includ- ing additional partial mappings and including test data corresponding query results.

Not all ambiguity may be revealed. Our inspection of individual results looked at the discrepancies between the two systems. False negatives are not quantifiable.

References

1. R. Fagin, L. Haas, M. Hern´andez, R. Miller, L. Popa, and Y. Velegrakis. Clio: Schema map- ping creation and data exchange.Conceptual Modeling: Foundations and Applications, 2009.

2. P. Shvaiko and J. Euzenat. Ontology matching: state of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering, 2012.

3. A. Tian, J. F. Sequeda, and D. P. Miranker. Query specific ontology matching. Technical report, Department of Computer Science, University of Texas, 2012.

1Clio is an automatic relational schema mapping system. However, the algorithms are applica- ble to ontologies.

2The test cases are available, see http://www.cs.utexas.edu/~atian/page/dataset.html

Références

Documents relatifs

In contrast, the presence of any kind of disjunction in the ontology results in the existence of models that differ in the queries they entail, and makes the query entailment

In such scenarios all aforementioned systems would compute a rewriting for the input query over the new ontology from scratch discarding any information previously computed, although

The OxO mapping graph can be used to explore both direct and indirect mapping between terms, thus can be used to used to help traverse gaps where no di- rect

We describe and evaluate a semi-automatic method for mapping the large lexical network WordNet 3.0 (WN) to BFO 2.0 exploiting an existing mapping between WN and the KYOTO

The database schema of our Phenomis database considered for mapping contains plant phenotyping information and environmental information.. The ontologies considered for

We demonstrate a high-level RDB2OWL language for database-to- ontology mapping specification in a compact notation within the ontology class and property annotations, as well as

In this paper we present a case study to show how our ontology-based meta-data model for ontology mapping reuse (OM 2 R) can be applied for the OAEI to document

This year, we reinstantiated the OAEI library track 1 , i.e., we provide the ontology matching community with the challenge to create alignments between thesauri.. First, we aim