• Aucun résultat trouvé

Key Phrase to Text Similarity, Clustering, and Interpretation in Hierarchical Ontologies

N/A
N/A
Protected

Academic year: 2022

Partager "Key Phrase to Text Similarity, Clustering, and Interpretation in Hierarchical Ontologies"

Copied!
1
0
0

Texte intégral

(1)

Key Phrase to Text Similarity, Clustering, and Interpretation in Hierarchical Ontologies

Boris Mirkin

Applied Mathematics and Informatics, National Research University Higher School of Economics Moscow

Computer Science and Information Systems, Birkbeck University of London BMirkin@hse.ru

Abstract. Scoring similarity between key phrases and unstructured texts is an issue which is important in both information retrieval and text analysis. Researchers from the two fields use different scoring functions, although clear delineation between the two still is lacking. We use suffix tree based score expressing the average conditional probability of a sym- bol in a common substring. Usually, a domain taxonomy serves as the source of key-phrases. Given a set of entities, such as texts or projects or working groups, one can derive clusters of key-phrases using key-phrase- to-entity scores. The clusters represent common themes in the meaning of texts or in activities of working groups. To interpret them, the domain ontology should be used. If the ontology is a rooted tree, a lifting method is proposed to find the most parsimonious interpreting head subject(s), up to a few gaps and offshoots. Some applications and application is- sues are considered. The work is being conducted jointly with T. Fenner (London), S. Nascimento (Lisbon) and E. Chernyak (Moscow).

Références

Documents relatifs

Assuming randomly drawn training pairs, we prove tight upper and lower bounds on the achievable misclassification error on unseen pairs, the bridge between the two settings

It allows a better scalability compared to the conventional AHC al- gorithm both in memory and processing time, and its results are deterministic unlike some aforementioned

In this paper, we present an attack method- ology that can be used to extract the text typed by a user from the sound recorded by the built-in microphones of a mobile phone.. We

In this paper a method for pairwise text similarity on massive data-sets, using the Cosine Similarity metric and the tf-idf (Term Frequency- Inverse

Scores obtained demonstrate that models trained at the harder levels of difficulty are more ro- bust to noise and achieve better results.. As before, also in this case MLP is,

In this paper, we use a phrase-based topic model approach and formal topical concept analysis on unstructured text in this domain to suggest additional concepts and axioms for

Semantic similarity is used to include similarity measures that use an ontology’s structure or external knowledge sources to determine similarity between

Lexical features are extracted from all the legal documents and the simi- larity between each current case document and all the prior case documents are determined using