• Aucun résultat trouvé

Automated Exploration of Ontology Repositories

N/A
N/A
Protected

Academic year: 2022

Partager "Automated Exploration of Ontology Repositories"

Copied!
2
0
0

Texte intégral

(1)

Automated Exploration of Ontology Repositories

Ondˇrej Zamazal and Vojtˇech Sv´atek

University of Economics, W. Churchill Sq.4, 130 67 Prague 3, Czech Republic {ondrej.zamazal|svatek}@vse.cz

Motivation The choice of adequate ontology repository is an important pre- requisite to finding an ontology to be reused or adapted for a concrete use case.

As the repositories are mostly affiliated to particular communities within the semantic web, understanding the typical features of ontologies in each of them is also helpful for designers of ontology management tools.

Overall Process, Metrics and Results Our ontology exploration process includes ontology collection, materialization and then metrics computation; finally, the resulting metrics areexplored using theR language1to automatically get asum- mary report in the form of tables. To automate the collection phase, we partly employedOntohub,2which is an open ontology repository mirroring several other repositories. The materialization includes ontology storing (into the database) in order to decompose them into entities, names, relations, imported ontologies and head nouns. We use the OWL-API3 to manipulate the ontologies.

We considered metrics related to four aspects of ontologies.4 Logical and structural metrics include, e.g., the numbers of different types of entities and axioms. We also categorize the complexity of ontologies into bins (as in [2, 3]).

Thenaming aspect reflects some basic information regarding the length of class name (local fragments of URI or labels), capitalization and usage of concatena- tion symbol/technique, i.e. a hyphen, underscore, camel-case or dot (as in [1]).

For theannotation aspect we compute the proportions of RDFS annotations.

We explored ontologies from five prominent ontology repositories (Table 1 contains just a few selected metrics). Due to parsing problems, unavailability of ontologies or their imports we however did not collect all ontologies from the repository.BioPortal5 is a web portal providing access to a library of well- curated biomedical ontologies via REST-ful services. It contains ontologies from another ontology repository, the OBO Foundry.6 We collected ontologies using the Ontohub mirror where only ontologies with size below 5MB (thus only 342 of

1 http://www.r-project.org/

2 https://ontohub.org/

3 http://owlapi.sourceforge.net/

4 Due to the space limitation full list of metrics and complete results are at the sup- plementary web page:http://owl.vse.cz:8080/MetricsExploration/

5 http://bioportal.bioontology.org/

6 http://obofoundry.org/

(2)

Metrics (June 2014 snapshot) BioPortal Dumontier LOV Prot´eg´e TONES

Ontologies processed 254 70 353 41 183

Percentage of all 74% 95% 83% 44% 88%

Complex class using existential restr. Avg 57% 28% 7% 14% 43%

Complex class as superclass Avg 74% 35% 69% 57% 67%

Branching Avg 0.88 0.55 0.48 0.61 0.79

Max 2.39 1.09 1.77 1.43 1.78

Multiple inheritance Avg 32 0 1 6 12

Max 1877 254 321 497 24800

Annotation as label Avg 38% 51% 32% 13% 38%

Annotation as comment Avg 1% 37% 25% 49% 37%

Camel technique Avg 15% 61% 39% 28% 29%

Underscore technique Avg 54% 0% 0% 23% 36%

Table 1.Selected metrics. Average (Avg) is either mean or median, according to better represen- tativeness. The min statistics is omitted since it is always zero. The max statistics is omitted for ratios because it is nearly always 100%. The larges value across all repositories is in bold.

the total) are available.7TheDumontier lab ontologies8are biological ontologies aimed at knowledge representation and reasoning. Their ontologies are quite interconnected (many mutual imports). LOV9 is a well-curated collection of linked open vocabularies used in the Linked Data Cloud. TheProt´eg´e ontology library mostly contains ontologies developed within the Prot´eg´e editor. As there is no programmatic access to the library, we manually downloaded them. It turned up that out of 93 ontologies (except Dumontier ontologies on which there is also a link) 43% ontologies were not available. Finally, theTONES repository (using its Ontohub mirror of 207 ontologies - collected 88%) contains ontologies of various domains, many of them however designed for testing purposes.

Future Work We plan to run such an analysis repeatedly, include more reposi- tories (preferably via Ontohub) and more metrics. We also want to keep the on- tology exploration services available via a web interface10where the users could ask, on the one hand, for the latest summaries of particular repositories, and on the other hand for particular ontologies or ontologies meeting some criteria.

Ondˇrej Zamazal has been supported by the CSF grant no. 14-14076P and this research is also supported by UEP IGA project F4/34/2014 (IG407024).

References

1. Manaf N. A. A. M., Bechhofer S., Stevens R.: A Survey of Identifiers and Labels in OWL Ontologies. In: OWLED-2010.

2. Matentzoglu, N., Bail, S., Parsia, B.: A Snapshot of the OWL Web. In: ISWC 2013.

3. Wang T. D., Parsia B., Hendler J.: A Survey of the Web Ontology Landscape. In:

ISWC-2006.

7 To overcome 5MB limitation we gathered BioPortal ontologies directly by RESTful services. Corresponding ontology metrics are available via the supplementary web.

8 http://dumontierlab.com/?page=ontologies

9 http://lov.okfn.org/dataset/lov/

10A sample service, providing metrics for a given ontology, is already available from http://owl.vse.cz:8080/MetricsExploration/.

Références

Documents relatifs

Despite the suggested positive effects of ODPs, the relatively modest use of ODPs in BioPortal may be due to ontology age, domain specificity, minimal tooling support, lack of

 Proposition and evaluation of a 2D and 2.5D hyperbolic tree visualization for exploring relationships between classes of ontologies (Silva and Freitas, 2011b);..  Proposition

The collaboration is based on metadata information about the ontology components, so that the users can express their understanding about the meaning of the concepts without

SOBOLEO supports five function groups (detailed below) to support a group of people in the joint structuring of an information space containing an ontology, data about documents

SEMIC.EU as the point of reference for eGovernment ontologies 5 At least one release of an asset has been approved as mature by the clearing process manager and

The rules presented here reflect the main characteristics of the classes of URIs discussed in Section 2, but can be easily extended for additional patterns (many of them being

the provided reference implementation the OPPL instructions provider is a flat file parser (by convention, files with the suffix .oppl are used), but it would be simple to program

Our contribution is to present a scenario testing approach using formal ontologies to validate that a UML class diagram represents the business requirements accurately.. Our