• Aucun résultat trouvé

Automated Knowledge Hierarchy Assessment

N/A
N/A
Protected

Academic year: 2022

Partager "Automated Knowledge Hierarchy Assessment"

Copied!
2
0
0

Texte intégral

(1)

Automated Knowledge Hierarchy Assessment

G. Nayak S. Dutta D. Ajwani P. Nicholson A. Sala Univ. of Minnesota, USA Nokia Bell Labs, Ireland

nayak013@umn.edu {firstname.lastname}@nokia-bell-labs.com

Abstract

Automated construction of knowledge hierarchies is gaining increasing attention to tackle the infeasibility of manually extracting and seman- tically linking millions of concepts. With the evolution of knowledge hierarchies, there is a need for measures to assess its temporal evolution, quantifying the similarities between different versions and identifying the relative growth of different subgraphs in the knowledge hierarchy.

This work proposes a principled and scalable similarity measure, based on Katz similarity between concept nodes, for comparing knowledge hierarchies, modeled as generic Directed Acyclic Graphs (DAGs).

1 Contribution

Large knowledge repositories like DBpedia [6] represent concepts and relations as hierarchies expressing seman- tic connections via parent-child orhypernym-hyponymedges. These hierarchies (typically represented as DAGs) form the backbone of semantic search, personalization, recommendation and textual entailment, enabling easy navigation across concepts for information linking [1, 7, 4, 3] As the knowledge hierarchies evolve, there is a need for taxonomy evaluation, i.e., comparing hierarchies and quantifying their similarity. Intuitively, a principled similarity measure should demonstrate the following properties: (1)Sensitivity to Concept Hierarchy, (2) Prox- imity of Least Common Ancestor, and (3)Importance of Relationship. Additionally, the practicality of assessing large hierarchies imposes the factors ofscalability,tunability, andinterpretabilityfor diverse applications.

Proposed Measure: To model similarities between DAGs, we adapt the Katz similarity measure [5], which captures multiple short directed paths between vertices, providing a good indicator of semantic subsumption.

We define the notion of Katz Similarity Vector representing each concept node to obtain the Katz Graph Similaritymeasure. We theoretically show that our proposed measure conforms to the above salient properties.

Further, for scenarios where computing the Katz Graph Similarity might be expensive, we propose a faster approximate variant, theGrouped Katz Similaritymeasure. We also show that theKatz Index[5], a centrality measure capturing the influence of a vertex in a graph, is in fact a special case of the Grouped Katz Similarity (with 1 group) for measuring acyclic graph similarity. This provides the Katz Index Graph Similarity measure, a linear time variant for computing the similarity between hierarchies; albeit with a slight loss in structural information due to aggressive grouping. Interestingly, the above similarity measures demonstrates monotonic behavior with respect to the degree of structural difference between the DAGs.

We performed large-scale experiments on different sub-hierarchies of the DBpedia knowledge graph with varying the number of concept nodes. We demonstrated that our proposed measures were not only scalable (with run-time improvement of ∼10000×compared to state-of-the-art Fowlkes-Mallows measure [2]), but also captures the structure and logical subsumption of concept relations in hierarchies. Additionally, we show that our measures are able to assess thetemporal evolution of concepts in correspondence with category disruptions.

Copyrightc by the paper’s authors. Copying permitted for private and academic purposes.

In: Joint Proceedings of the First International Workshop on Professional Search (ProfS2018); the Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR); and the International Workshop on Data Search (DATA:SEARCH18). Co-located with SIGIR 2018, Ann Arbor, Michigan, USA – 12 July 2018, published at http://ceur-ws.org

59

(2)

References

[1] Fensel, D.: Spinning the Semantic Web: Bringing the World Wide Web to its full potential. MIT Press (2005)

[2] Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. American Statistical Association78(1983) 553–569

[3] Geffet, M., Dagan, I.: The distributional inclusion hypotheses and lexical entailment. In: ACL. (2005) 107–114

[4] Harabagiu, S.M., Maiorano, S.J., Pa¸sca, A., M.: Open-domain textual question answering techniques. Natural Language Engineering 9(3) (2003) 231–267

[5] Katz, L.: A new status index derived from sociometric analysis. Psychometrika18(1) (1953) 39–43

[6] Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web6(2) (2015) 167–195

[7] Maedche, A., Staab, S.: Measuring similarity between ontologies. In: EKAW. (2002) 251–263

60

Références

Documents relatifs

On the example of scientific citations we showed that it allows us to quantify the influence of a given paper (node) on the others, to identify seminal and innovative papers

With respect to our 2013 partici- pation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya

The proposed similarity measure is composed of three sub-measures: (i) the first one exploits the taxonomic struc- ture of ontological concepts used to instantiate resources, (ii)

In Section 3, we give a new similarity measure that characterize the pairs of attributes which can increase the size of the concept lattice after generalizing.. Section 4 exposes

We tested that the network had learned feature vectors which were representative of the original cases by measuring the euclidean distance between pair vectors at the similarity

So, in explanatory coherence we seek to determine the acceptance or rejection of hypotheses based on how they cohere and incohere with given evidence or with competing hypothe- ses;

In addition to ontology refinement, there are some existing ontology evaluation methods, but these evaluation methods use a scale of ontology or coverage of required

Since the original Euclidean distance is not suit for complex manifold data structure, we design a manifold similarity measurement and proposed a K-AP clustering algorithm based