Reducing correspondences that contain mul-

7.3 Alignment combination operators

7.3.2 Alignment aggregation

7.3.2.3 Reducing correspondences that contain mul-

be confusing for an automatic usage), we apply an operator that selects one on the relations based on calculating its plausibility (or “upper probability function”). The calculation of the plausibility is based on the masses that are attributed to each relationr in the relations set of a correspondence.

Let c_k = he_x, ey, R_k[. . . ,(ri, m_A(ri), . . .]i be a correspondence in an ag-gregated alignmentA.

pl(ri) = 1− X

ri∩rj=∅

mA(rj); f or each rj ∈Rk

For example if we have the correspondence c as:

he₁, e3, R={({=},0.45),({@},0.45),({@,G},0.1)}i the plausibility of each relation in the setR is:

P l({=}) = 1−0.45−0.1 = 0.45, P l({@}) = 1−0.45 = 0.55, P l({G}) = 1−0.45−0.45 = 0.1,

P l({⊥}) = 1−0.45−0.45−0.1 = 0 =P l({A})

As a result the reduced correspondence is then represented as follows:

he₁, e₃,(@,0.55)i 7.3.3 Alignment union

Considering a given alignment A, if another alignment B “agrees” about a correspondence withA, then it is considered that the alignmentAbrings an extra proof that the alignment relation holds between two entities. Conse-quently, an aggregation in favour of maximising the confidence measure of the correspondences is applied.

LetAbe a set of alignmentsAi of a size nmatching the same resources R_source and R_target,the union of the alignments A_i is an alignment A_aggr matching between the same resourcesR_sourceandR_target and constructed as follows:

1. Create a new empty alignment A^S and generate its metadata;

2. For each alignment Ai get its correspondences and add them to the alignment A^S;

3. Divide the resulting correspondences set into subsets of conflicting cor-respondences Cconf. (Reminder: conflicting correspondences are cor-respondences having the same source and target entities).

4. Aggregate each set of conflicting correspondences using themax aggre-gator Aggr_∨

A^S=

m−1

[

k=0

{Aggr_∨({C_conf})}

wherem is the size of the list of sets of conflicting correspondences.

The importance of this operation is to enrich alignments between re-sources by adding new correspondences to them, which can be resumed to a U nion operation.

7.3.4 Alignment intersection

In the case of having multiple alignment resources matching the same two resources, some applications may require more precision in the alignments.

To be sure about a correspondence and its correctness or usefulness, the similarity measure is not the only indicator, the correspondence must be proposed by different alignment tools. Considering a given alignment A, if another alignmentB “does not agree” withA about a correspondence, then

it is considered that the alignmentA brings more an extra proof against the alignment relation that might hold between two entities. Consequently, an aggregation in favour of minimizing the confidence measure of the correspon-dences is applied.

LetAbe a set of alignmentsAiof a sizenmatching the same resourcesRs

andR_t, the intersection of the alignmentsA_iis an alignmentA_aggr matching between the same resourcesR_s andR_t and constructed as follows:

1. Create a new empty alignment A^T and generate its metadata;

2. For each alignment A_i get its correspondences and add them to the alignmentA^T.

3. Divide the resulting correspondences set into subsets of conflicting cor-respondencesCconf. If the size of each set is lower than the size of the listAthen drop all the correspondences of this set from the alignment.

4. For the rest of the sets, aggregate the correspondences of each, using themin aggregatorAggr_∧

A^T=

m−1

[

k=0

{Aggr_∧({C_conf})}

wherem is the size of the list of sets of conflicting correspondences.

The importance of this operation is to enhance the quality of alignments between resources by keeping only the correspondences that provided and agreed on by all the aggregated alignments.

7.3.5 Alignment difference

In order to analyze alignment quality and compare alignments between two resourcesR₁ andR₂, an operator for extracting the difference between align-ments is needed. This operator calculates and extracts the difference between an alignmentAi and an alignment Aj.

A=A_i\A_j ={(c_k)_c_k ∈Ai ∧(c_k ∈/ A_j)}=A_i∩ ¬A_j

For the correspondences that are found in both align-ments we introduce the difference between correspondences.

Let cl =

ex, ey,[(r1, µAj(r1)), . . . ,(rn, µAj(rn))]

and ck =

he_x, e_y,[(r₁, µ_A_i(r₁)), . . . ,(r_n, µ_A_i(r_n))]i where r_i ∈Θ, then for each pair of conflicting correspondences we define a difference between correspondences as:

c_k\c_l= Aggr_∧(c_k,¬c_l) where:

¬c_l=

e_x, e_y,[(> \r₁,(1−µ_A_j(r₁))), . . . ,(> \r_n,(1−µ_A_j(r_n)))]

7.4 Implementing alignment combination and man-agement operators

In order to manage alignment resources within the repository and based on the alignment model that we described in the chapter 6. We implemented the combination operators and a set of other operators for managing alignment resources (The repository can be accessed online¹). This repository is a direct application of our approach for representing and managing heterogeneous knowledge resources. We show that the proposed methodology is applicable on alignment resources and the abstract operators for resources combination and management are useful to generate new resources from existing ones.

As explained in chapter 6, the repository is implemented using a triple store technology (Allegrograph) and is based on the T OK_Ontodescribed in the chapter 4. We described the ability of the alignment model to support alignment resources represented in different formalisms. Figure 7.5 represents the architecture of the repository. In this section, we focus on describing the implementation of alignment management operators.

The implemented repository offers the following functionalities:

1. Managing alignment resources using operations such as: importing, updating, editing and exporting alignment resources;

2. Combining alignment resources by implementing the operators de-scribed above in the section 7.2: Composition, Aggregation, Union, Intersection, Difference.

3. Exploring entities within the repository:

(a) Search for resources;

(b) Search for composition paths between two resources based on their alignments;

1To access the repository please follow this URL:

http://129.194.69.195/tokonto/index.php, the content is subject to change since we are driving multiple tests

Triplestore (AllegroGraph) Alignment

Resources

Import

Ressources Selector

Operation (Composition, ...)

Generated Alignement

Queries (SPARQL) Resultts

interactive Interface Script

Interface

Global Alignment Model

Figure 7.5: Architecture of the alignment repository

7.4.1 Implementing fuzzy aggregators

The fuzzy aggregation operators (fuzzy conjunction and fuzzy disjunction) are well defines in literature and their properties have been established [Gupta & Qi 1991]. For instance, the min operator is one of the most used triangular norms and is commonly applied as a fuzzy conjunction (applied for the aggregation by intersectionAggr_∧).

In the literature [Gupta & Qi 1991], a triangular norm is defined as a function tdefined as follows:

t: [0,1]×[0,1]→[0,1]

(x, y)7→a

The t−norm operator satisfies the following axioms for each x, y, z ∈ [0,1]:

• monotonicity: x≤y =⇒ t(x, z)≤t(y, z)

• boundary condition: t(x,1) =x

• associativity: t(x, t(y, z)) =t(t(x, y), z)

• commutativity: t(x, y) =t(y, x)

In our case of study we implemented three aggregation operators that interpret fuzzy conjunction using each a different t−norm from the three typical following operators:

• minimum t−norm: tm(x, y) =min(x, y)

• productt−norm: t_p(x, y) =x.y

• Lukasiewiczt−norm [Łukasiewicz 1968]: t_l(x, y) =max(x+y−1,0) For the fuzzy disjunction operators we implemented three aggregation operators that use each a differentt−conormfrom the three typical following operators:

• maximum t−conorm: t^c_m(x, y) =max(x, y)

• productt−conorm: t^c_p(x, y) =x+y−x.y

• Lukasiewiczt−conorm [Łukasiewicz 1968]: t^c_l(x, y) =min(x+y,1) All the properties of these operators are defined in the literature [Gupta & Qi 1991] and many of their aspects explored especially the relation

∀x, y∈[0,1], tl(x, y)≤tp(x, y)≤tm(x, y).

7.4.2 Executing combination operators

A corpora of heterogeneous alignment resources represented in various for-malisms and generated by different tools are available within the repository.

In order to execute the combination operators described above we created two types of interfaces for applying them:

1. Operations by alignment: the entry point for executing alignment com-bination operators is the alignment itself;

2. Operations by resources: the entry point for executing alignment com-bination operators are the resources [source and target].

In order to execute a specific operation (composition, intersection, union, etc.) other parameters need to be specified:

1. choose the operation to perform;

2. the list of alignment resources to be involved in the operation (the alignments must have the same source and target resources);

3. the aggregation method to be applied (t−norm and t−conorm ag-gregators);

Home Alignment Operation Search SPARQL

TOKOnto

Operation by Alignment

Select operation Intersection Alignment URI * http://cui.unige.ch/isi/onto/tok/ola1.rdf

Alignment URI * http://cui.unige.ch/isi/onto/tok/cowl1.rdf + Method * min(x,y)

submit reset

Alignment : URI - null Type - simple

Created from - intersection Alignments used :

http://cui.unige.ch/isi/onto/tok/ola1.rdf http://cui.unige.ch/isi/onto/tok/cowl1.rdf Comment :

Source - http://book.ontologymatching.org/example/culture-shop.owl Target - http://book.ontologymatching.org/example/library.owl Number of correspondences : 1

Correspondence :

Entity Source = http://book.ontologymatching.org/example/culture-shop.owl#Book Entity Target = http://book.ontologymatching.org/example/library.owl#Volume Relations :

Relation = moreSpecific, Similarity = 0.6363

Alignment URI [Result] *

Comments

Save Alignment Cancel

Figure 7.6: Operations by alignments interface

When the parameters are set (see figure 7.6), a command is generated to execute the process and a new instance of the process is created in the

repository. The URI of the process execution instance is defined as the provenance of the generated alignment.

Execute : java -jar tokonto.jar [operation] [uris] [method] print

Once the operation is executed the result is recovered and displayed.

Metadata information such as the name of the generated alignment and its description are required to save the alignment and associate it to its provenance process.

Save : java -jar tokonto.jar [operation] [uris] [method] print

In order to apply operations on alignments using the second entry point, which is the resources (see figure 7.7) there is a possibility to select two resources from the repository. Once these resources identified then a path finding operation is applicable in order to selects all the alignments in the repository matching both resources (either direct alignments or path align-ments). Once the alignments are found, then there is a possibility to trigger operations on these alignments.

Figure 7.7: Operations by resources interface The commands that are executed for each option are:

• To retrieve direct alignments: java -jar tokonto.jar getAlignsByRes uri1 uri2

• To retrieve composition paths: java -jar tokonto.jar find uri1 uri2

In order to find all composition paths between two resources A and B within the repository we implement an in-depth graph parser.

Algorithm 1: Alignment path Finding algorithm (between two resources) Data uris : Uri of the source resource

uri_t : Uri of the target resource

Result P aths: list of paths (path is a list of Alignment Uris)

Variables visited: list of Alignments visited during the path searching Aligns =model.listResources(TOKJena.has_align_source,uris);

foreachAlign_i ∈Alignsdo visited.add(Align_i);

findPath(Aligni,urit,visited,P aths);

end foreach returnP aths;

Algorithm 2: Alignment path finding (based on an alignment) Data Align: alignment

urit : URI of the target resource

visited: List of visited Alignments (URI) P aths : List of paths

tU ri =Align.getTargetResource();

if tU ri.equals(uri_t)then P aths.add(visited);

else

Aligns =model.listResources(TOKJena.has_align_source,tU ri);

foreachAligni ∈Alignsdo

if visited.contains(Align_i.getURI()) then

continue;

else

temp = new List(visited);

temp.add(Align_i);

findPath(Align_i,uri_T,temp,P aths);

end if end foreach end if

7.4.3 Alignments overview, update and edition

We created interfaces for viewing, editing and updating alignment resources.

The edit and update operators are monitored within the repository and their

execution generates a change event that creates a new version of the align-ment resource and triggers the execution of all the processes where the cur-rent alignment has been used.

Algorithm 3: Update Algorithm Data Align: new alignment

uri : URI of the alignment to update model =maker.openGraph(uri);

model .removeAll();

import(Align);

updateAssociated(Align);

Algorithm 4: Re-generation of derived alignments Data Align: new alignment

uri =Align.getURI();

// Getting alignments associated with Align

uri_aligns=model.listResourcesWithProperty(TOKJena.Collection_of, tU ri);

foreachuriAi ∈uri_aligns do

// Getting alignments used to createuriAi

uri_used =model.listObjectOfProperty(model.getResource(uriA_i), TOKJena.Collection_of);

foreachuri_uj ∈uri_used do Aligns.add(export(uri_uj));

end foreach

// Getting the operation used to createuriAi

operation=uri_uj.getProprety(TOKJena.Created_from);

newAlign=operation(Aligns);

// Updating the alignmenturi_uj

mod=maker.openGraph(uri_uj);

mod.removeAll();

import(newAlign);

updateAssociated(newAlign);

end foreach

We retrieve all the alignments having as provenance a process that uses operations involving the current modified alignment. For each alignment in the list we re-execute the generation process (which triggers again an update event).

7.4.4 Discussion about the aggregation metrics

Our approach for combining alignment resources is based on a mathematical background and does not arbitrarily combine or optimize some parameters to enhance the results. The interpretation of the alignments as sets of fuzzy relations or using the Dempster-Shafer theory were explained and detailed in section 7.2.2 and section 7.2.3. Operators for respecting the coherence of the properties and aspects of each interpretation were proposed and imple-mented. Our framework for alignment combination is an original combina-tion between the algebra of alignment relacombina-tions [Euzenat 2008] as a theory for combining alignment relations and robust belief combination theories.

For combining confidence measures there is a need to apply functions and metrics to calculate the resulting belief or mass associated to an align-ment relation within a correspondence. The metrics that we described in section 7.4.1 are proposed in the literature and have the required properties to respect the mathematical grounding of our combination operators. These metrics are also used for the composition and their usage preserves the as-sociativity of the proposed alignment combination operators (composition, aggregation, intersection, etc.).

Associativity is a very important aspect in our context of application since it can be used in the context of combining multiple alignments. If the composition operator were not associative, then for each of its applications, especially to compose a path of alignments between two resources, multiple combinations would be possible (which is not desired in this context).

7.5 Discussion

In this chapter, we defined a set of combination and management operators designed for alignment resources. We proposed an operator for composing alignment resources using uncertainty theories and an operator for aggregat-ing alignment resources generated by different matchers usaggregat-ing a fuzzy theory and an evidence combination theory. We also created a repository for align-ment resources based on the alignalign-ment model and implealign-mented the proposed operators within an API. The description of these operators was supported by the operators model [Ghoulaet al. 2014];

These operators are instances of the abstract operators that we described in chapter 5. We implemented these operator on top of a triple store based on the alignment representation model that we introduced in chapter 6. We de-fined a theoretical background for interpreting alignments using uncertainty-based theories. These interpretations allowed us to define composition and aggregation operators and to create a framework for managing and combin-ing alignment resources.

We did not focus on the semantic aspects in this contribution, we as-sume that the alignment consistency checking can be implemented using current guidelines [Beisswanger & Hahn 2012] or components integrated in alignment tools [Zimmermann & Jérôme 2006, Jiménez-Ruiz & Grau 2011].

The next chapter is dedicated to define en evaluation methodology for the implemented alignment repository and describe experimental results of ap-plying the combination operators.

Evaluation of alignment resources combination operators

Contents

8.1 Evaluation methodology . . . 146 8.1.1 Building a test corpus . . . 146 8.1.2 Computing precision and recall measures . . . 147 8.1.3 Evaluation of combination and aggregation operators . 148 8.2 Experimentation and results . . . 149 8.2.1 Alignment union evaluation results . . . 151 8.2.2 Alignment intersection evaluation results . . . 153 8.2.3 Alignment composition evaluation results . . . 155 8.3 Usage of alignment composition to enrich existing

alignments . . . 159 8.4 An approach for enhancing composition using the

content of the resources . . . 160 8.4.1 Extending composition path finding using the content

of a common resource . . . 162 8.4.2 Composition path finding using an alignment

exten-sion operator . . . 164 8.5 Conclusion and discussion . . . 166

In this chapter, we propose a methodology for evaluating the align-ment combination operators that we described, defined and implealign-mented in the previous chapter. We used the resources representation model and the generic model for representing alignment resources to build a repository of alignment resources. We choose to evaluate the approach using ontology alignment resources and we apply the operators of aggregation and composi-tion to see if they can effectively improve the quality of ontology alignments.

The usage of ontology alignments as an evaluation case is motivated by the

availability of these resources and by the nature of challenges in this re-search field. We can also apply these operators as they were implemented on terminological or linguistic alignments.

8.1 Evaluation methodology

In order to evaluate the proposed combination and consequently the aggre-gation functions we used the metrics of precision and recall to compare the generated alignments with reference alignments. The evaluation methodol-ogy is based on the following steps:

1. building a test corpora containing:

• knowledge resources from different domains;

• reference alignments between these resources;

• a set of alignment tools to create multiple alignments between the resources.

2. compare the resulting alignments from the application of the combina-tion operators with the reference alignments and then the best align-ment tools.

8.1.1 Building a test corpus

For the evaluation of alignment composition and aggregation, we need to build a corpus of heterogeneous alignment resources. Alignments must be represented in different formalisms and between two resources we need to collect more than one alignment to have the right ingredients for aggregating and composing alignments and to create at each case a reference alignment to compare results with. Thus, the requirements for building this kind of corpus are:

• the matching process between resources from the same domain should generate a non empty set of alignments;

• for each resource there exists at least another resource in the corpora where the alignment between them is not empty;

• the corpora must contain ontologies represented in different levels of expressivity (to consider a wider evaluation, resources in this cor-pora must be of different kinds: ontological, terminological, termino-ontological or lexical);

• for each pair of aligned resources, there must be a possibility to easily reuse or build a reference alignment.

In order to control the quality of combined alignments, there is a need for reference alignments between the resources of the corpora. For example, for the set of resources: {R_x, Ry, Rz, Ru} from the same domain, the reference alignments that must be found or built are between the couples of resources:

(R_x, R_y); (R_x, R_z); (R_x, R_u); (R_y, R_z); (R_y, R_u) and(R_z, R_u).

Multiple matching tools and services generate alignments between re-sources. Using the resources’ corpus and a set of matchers (ontology match-ers, terminology matchmatch-ers, linguistic matchmatch-ers, etc.), a corpus of alignment resources can be generated.

8.1.2 Computing precision and recall measures

The usage of precision and recall calculation in most alignments evaluation campaigns are based only on the existence or not of the required relation within the resulting alignment compared to a reference alignment. The con-fidence measure is not taken into consideration although it is an important feature within matching tools.

For instance, if an alignment tool detects a correspondence c = he_x, e_y,{≡}_0.5i, and the reference alignment contains c_ref = he_x, e_y,{≡}i, the correspondence is computed as a relevant answer (which is not really fully relevant). [Euzenat 2007] proposed new metrics about semantic preci-sion and recall calculation for evaluating ontology alignments but they also do not consider the confidence measures.

P recision= {OriginalCorrespondances} u {RetrievedCorrespondances}

{RetrievedCorrespondances}

Recall= {OriginalCorrespondances} u {RetrievedCorrespondances}

{OriginalCorrespondances}

Since confidence measures combination is an important feature in our approach, we propose to consider the margin value between the confidence measure of the reference correspondences and the generated (aggregated) correspondences. Let c_ref and ccomp be two correspondences respectively members of a reference alignment and a generated alignment. The compari-son of both correspondences is done as follows:

1. Feature 1: If the evaluation method is based on the basic defi-nition of precision and recall measurements, then these metrics are

applied and if both correspondences are equal then count it as a relevant one. For example, if an alignment tool detects a corre-spondence c = he_x, ey,{≡}_0.2i, and the reference alignment contains c_ref =he_x, ey,{≡}i, then we increment the number of relevant corre-spondences within the resulting alignment by “1”.

2. Feature 2: If we consider the value of the confidence measure within the alignment, then the number of correct correspondences is increased by the margin value between the confidence measure of the reference correspondence and the confidence measure of the generated corre-spondence. Thus, instead of incrementing the number of relevant cor-respondences by “1”, it is incremented by (1−(ABS(c_ref.degree− c_comp.degree))). For instance, if an alignment tool detects a

Dans le document An ontology-based repository for combining heterogeneous knowledge resources (Page 155-0)