• Aucun résultat trouvé

Importing and exporting alignments

6.5 Importing alignment resources using T OK Align model

6.5.2 Importing and exporting alignments

This operator extracts alignment entities represented in different alignment formalisms and creates representations of the imported alignments within the repository using the global alignment model that we defined in the previous section.

Instead of creating a parser for each alignment formalism, we collect mapping between these formalisms and the global alignment model and use it for extracting alignments from the input files and then we create instances of these resources and their content within the repository. The mapping file guides the parser and enables the program to identify alignment entities and transform them into instances within the repository.

The transformation and import (see figure 6.3) of an alignment repre-sented using one of the previously described alignment formalisms follows these steps:

1. Parse the configuration file and get the tags of the required elements based on the alignment file’s format.

2. Analyse the content of the alignment’s file, or URL, using an XML parser based on the STAX5 API and create an instance of the align-ment resource as an “Alignment” and instances of its content as cor-respondences, entities, and relations;

3. Store the “Alignment” in the repository.

Parser

Alignement TripleStore

Allegrograph input

[Mapping - Formats]

[Mapping - Relations]

[Alignment files]

output

[RDF Triples]

[Generic representation]

Figure 6.3: Architecture of the resources’ import component

The figure 6.4 represents an excerpt of the XML mapping file that is used as input for the abstraction operator (or import).

Figure 6.4: Excerpt of mappings between alignment formalisms and the generic alignment model

Since we are using ontologies as a means of knowledge representation and RDF as a formalism for storing instances of the ontology TOK_Onto, we choose to implement the repository as a Triplestore. When the alignment is represented using the global model, we use Jena6 API in order to store it

5http://docs.oracle.com/javase/tutorial/jaxp/stax/api.html

6http://jena.apache.org/

within the RDF triple store based on AllegroGraph7.

We choose AllegroGraph (see figure 6.5) because it offers the possibility of using RDF++ reasoning and multiple options for indexing triple and querying the triplestore. It also offers an API to integrate Jena, which is practical for building importers and exporters of knowledge resources. Jena is a Java API for building semantic Web applications, which is composed of:

• Interfaces for manipulating RDF resources;

• Interfaces for manipulating OWL ontologies;

• A SPARQL query engine;

• A rule based reasoner.

C#

Jena Sesame

Lisp Java Python

Clojure

Any HTTP Client Java, Ruby,...

Direct Server NEW!

HTTP Server

Sesame REST Server

SPARQL Protocol Server Common Server Services

Allegrograph RDF Store Client

Server

Storage +

Figure 6.5: AllegroGraph’s Architecture

We used the interfaces for RDF management in order to transform the alignments represented in the global model into RDF triples and add them to the graph of the repository. The usage of a triple store is motivated by the need to manage semantic information within an ontological context.

A triple-store (as detailed in the chapter 2) is a knowledge base man-agement system for the semantic web allowing to store, query and manage RDF data. This system allows to store one specific type of data, which is RDF statements (triples) that can be retrieved using SPARQL (Simple Protocol AndRDF Query Language). For our prototype we used a native triple-store called AllegroGraph since it uses the RDF model and offers a

7http://franz.com/agraph/allegrograph/

Java client that integrates Jena API. Each imported alignment can be ex-ported using the generic alignment namespace. The export algorithm uses the resource’s graph represented within the repository and generates an RDF file.

Once alignment resources have been imported and stored in the reposi-tory, operations such as M erge, Intersection or Composition are possible to be executed and can generate new alignments that will be added to the repository. Each alignment is stored and associated to some metadata ele-ments describing its provenance (generating tool, institution, author, source, target, etc.). Entities used in the alignments are unique and used by making reference to their URIs (no risk of duplication or redundancy). Figure 6.6 represents the interface of the repository of alignment resources that is built using the ontology TOK_Onto and the generic alignment model. This in-terface represents the form for uploading alignment files represented in one of the previously described formalisms. When the alignment is uploaded and imported successfully an excerpt of its metadata is displayed.

Figure 6.6: Importing an alignment between two biomedical ontologies

6.6 Discussion

In this chapter we categorized alignment resources and designed a generic model for representing and storing alignments [Ghoulaet al. 2013].

The alignment representation model is one of the proposed refinements to the resources representation model that applies our approach of resources representations. The approach states that resources are described in general using the meta-model and their content is described as subclasses of the meta-model (N ode_Entity, Link_Entity and Expression_Entity). The proposed model in this chapter defines a generic representation of all the resources of the type “Alignment”.

The vocabulary of this model is integrated in the resources model (T OK) using class subsumption axioms (e.g. Correspondence v Expres-sion_Entity vResource_Entity ). This shows the flexibility of the repre-sentation approach and the ability to represent resources using specific or generic vocabularies.

In the next chapter we will use this model as a basic framework to define operators for managing and combining alignment resources. This is a direct application of the proposed methodology for combining knowledge resources.

Operators for combining and aggregating heterogeneous alignment resources

Contents

7.1 Approaches for alignment resources reuse . . . 114 7.1.1 Approaches reusing existing alignments . . . 114 7.1.2 Approaches proposing theories for alignment

composi-tion . . . 116 7.2 An approach for alignment resources combination . . 118 7.2.1 Framework of representing alignment correspondences 119 7.2.2 Interpretation of correspondences using fuzzy set theory120 7.2.3 Interpretation for Dempster-Shafer theory . . . 123 7.2.4 Switching from an interpretation to another . . . 124 7.3 Alignment combination operators . . . 125 7.3.1 Alignment composition . . . 125 7.3.2 Alignment aggregation . . . 129 7.3.3 Alignment union . . . 133 7.3.4 Alignment intersection . . . 133 7.3.5 Alignment difference . . . 134 7.4 Implementing alignment combination and

manage-ment operators . . . 135 7.4.1 Implementing fuzzy aggregators . . . 136 7.4.2 Executing combination operators . . . 137 7.4.3 Alignments overview, update and edition . . . 140 7.4.4 Discussion about the aggregation metrics . . . 142 7.5 Discussion . . . 142

In this chapter we propose a methodology for combining alignment re-sources. The approach defines a set of knowledge engineering operators that are used to derive new alignments from existing ones generated by different

tools. For instance we propose a composition operator that generates a set of correspondences between entities from two knowledge resources based on the alignments between them. We describe two types of interpretations for alignments. Both interpretations are based on two different uncertainty the-ories (fuzzy set theory and Dempster-Shafer theory). The comparison of the results of both theories is detailed in the following chapter.

Alignments are the output of a matching process that generates corre-spondences between entities of heterogeneous knowledge resources. A large number of alignment methods have been proposed to contribute to automa-tizing and creating bridges between different kinds of resources (ontologies, terminologies, corpora, etc.). A considerable amount of alignments is being created either by human experts or using automatic matching tools. Thus, collecting and managing the constructed alignments is useful in order to compare them or combine them to enhance their quality.

7.1 Approaches for alignment resources reuse

Alignments of a good quality between complex knowledge resources are costly to create, mostly because they are validated or built by a human expert.

The result of a matching process of this kind is a valuable resource that must be stored, shared and reused. In particular, the idea of generating new alignments by composing already existing ones is appealing. Indeed several studies stated the importance and usefulness of composing alignments and multiple systems and tools have been proposed to combine alignment methods [Euzenat 2004, Paridaet al. 1998].

Nevertheless, only few concrete tools such as the alignment server [Davidet al. 2011] have been created to manage alignment resources. A no-table exception can be found in the natural language processing area where it is a common practice to build bilingual lexicons or aligned sentences by transitivity. We classified theses methods into two categories:

1. to show the utility of reusing alignment resources we describe some approaches that use existing alignments to generate new ones;

2. to show the current state of the art about alignment combination we describe some approaches that propose algebra for alignment relations and theoretical backgrounds for defining alignment composition.